Enhancements to `@group_by()`

kdpsingh commented 1 year ago

There are several enhancements needed to @group_by(), some of which depend on first addressing #8:

[x] Implementing @ungroup()
[x] The ability to define new columns inside of a @group_by(), such as @group_by(avg_sbp = (sbp1 + sbp2) / 2).

This would be implemented by first parsing the expressions using parse_r() from #8, running transform() to create the new columns, and then running groupby().

[x] Mimicking the ungrouping behavior of dplyr. In dplyr, the final layer of grouping is automatically removed after each summarize() operation but not after any other operation. I need to better understand how ungrouping works in DataFrames.jl and which operations remove the grouping versus which ones do not. If all operations remove grouping, then we could manually regroup using dplyr rules.
[ ] Add .by parameter to other macros to allow for in-line grouping

kdpsingh commented 1 year ago

As of 2e3b5cbb93943b98fd98da58fb1721be7d0e7280, @group_by accepts tidy expressions as in the example above.

bkamins commented 1 year ago

I need to better understand how ungrouping works in DataFrames.jl and which operations remove the grouping versus which ones do not.

There is an ungroup keyword argument that allows you to choose if you want to ungroup or not in every operation. By default it is true.

kdpsingh commented 1 year ago

Awesome, that's very straightforward. I will incorporate this in the next version.

One note I'll leave here in case someone else decides to work on this issue is that summarize() should remove one "layer" of grouping if grouped by multiple columns, whereas other functions should leave the data grouped as-is.

kdpsingh commented 1 year ago

I'll open a separate issue to add .by in the future. Since this is a relatively new feature, it's not in widespread use just yet. Otherwise, the items in this list are completed.

TidierOrg / Tidier.jl

Enhancements to `@group_by()` #9