TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
515 stars 14 forks source link

Add support for n()? #34

Closed kdpsingh closed 1 year ago

kdpsingh commented 1 year ago

For simple situations, nrow() does the job. But I'm not sure if this works across all top-level macros where relevant. Need to test using slice, mutate, summarize, and filter, and either document how to use nrow(), or add n(). Either way, I think we should add n() even if it ends up being an alias for nrow().

Also may help to add row_number(), which is similar to eachindex but slightly different because eachindex requires a column name to work inside of a subset.

bkamins commented 1 year ago

I agree it is best to try to do things as close to R as possible.

kdpsingh commented 1 year ago

Thanks @bkamins.

Note to self: Need to check if eachindex(Cols(1)) is valid code inside DataFrames.jl. That might be a shortcut that works throughout all the macros to implement row_number().

Also note to self: n() needs to work correctly within groups for grouped data frames.

bkamins commented 1 year ago

Cols(1) => eachindex is valid in DataFrames.jl. Note, however, that e.g. combine(df, eachindex) on purpose does not require passing a column. The reason is that df could have no columns and you still want to be able to gracefully handle this corner case.

kdpsingh commented 1 year ago

Thanks, and I believe transform() doesn't require it either (which I use as part of the @slice implementation). This should be straightforward to handle.

kdpsingh commented 1 year ago

Going to work on this issue next. Once we have this, it will help standardize other macros like @count() and @tally(), which will wrap this functionality.