JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.72k stars 367 forks source link

`combine` on grouped df return empty df when args is empty #3399

Closed ctarn closed 10 months ago

ctarn commented 10 months ago
import DataFrames

df = DataFrames.DataFrame(x=[1, 2, 2], y=[3, 4, 5])
gd = DataFrames.groupby(df, :x)
display(DataFrames.combine(gd))
"""
output:
0×1 DataFrame
 Row │ x     
     │ Int64 
─────┴───────

expect:
2×1 DataFrame
 Row │ x     
     │ Int64 
─────┴───────
   1 │     1
   2 │     2
"""
display(DataFrames.combine(gd, :y => first))
"""
output:
2×2 DataFrame
 Row │ x      y_first 
     │ Int64  Int64   
─────┼────────────────
   1 │     1        3
   2 │     2        4
"""
bkamins commented 10 months ago

The mental model you should have is that number of rows per group is adjusted to values returned in args in combine. If args is empty the number of rows per group is 0 so you get an empty data frame. if you want to keep grouping variable then you have two options. Keep the number of rows from the source:

julia> DataFrames.combine(gd, :x)
3×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     2

Keep one row per group (as in your original post):

julia> DataFrames.combine(gd, :x => first => :x)
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2
ctarn commented 10 months ago

thanks