TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
524 stars 14 forks source link

Enhancements to `@group_by()` #9

Closed kdpsingh closed 1 year ago

kdpsingh commented 1 year ago

There are several enhancements needed to @group_by(), some of which depend on first addressing #8:

This would be implemented by first parsing the expressions using parse_r() from #8, running transform() to create the new columns, and then running groupby().

kdpsingh commented 1 year ago

As of 2e3b5cbb93943b98fd98da58fb1721be7d0e7280, @group_by accepts tidy expressions as in the example above.

bkamins commented 1 year ago

I need to better understand how ungrouping works in DataFrames.jl and which operations remove the grouping versus which ones do not.

There is an ungroup keyword argument that allows you to choose if you want to ungroup or not in every operation. By default it is true.

kdpsingh commented 1 year ago

Awesome, that's very straightforward. I will incorporate this in the next version.

One note I'll leave here in case someone else decides to work on this issue is that summarize() should remove one "layer" of grouping if grouped by multiple columns, whereas other functions should leave the data grouped as-is.

kdpsingh commented 1 year ago

I'll open a separate issue to add .by in the future. Since this is a relatively new feature, it's not in widespread use just yet. Otherwise, the items in this list are completed.