JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.72k stars 367 forks source link

Feature request: Group data frames to accept string or symbol as index when grouping is by a single attribute #3470

Open alex-s-gardner opened 3 days ago

alex-s-gardner commented 3 days ago

Make example grouped data frame

using DataFrames
df=DataFrame(city=["Paris", "London", "Paris", "Berlin", "London", "Berlin", "Berlin"],  
             date= ["10-1$k-2021"  for k in 3:9],
             v=38 .+100*rand(7))
gdf = DataFrames.groupby(df, :city)

this is how the data frame needs to be accessed now:

gdf[("Berlin", )]

for a data frame grouped by a single attribute it would be more intuitive to simply index in the same was as a data frame column

gdf["Berlin"] == gdf[:Berlin] ==  gdf[("Berlin", )]
bkamins commented 2 days ago

The reason why this is not allowed is that gdf[1] would be ambiguous as it could mean:

  1. Selecting the first group.
  2. Selecting the group with key equal to 1.

In the past we discussed allowing gdf(1) (and by extension e.g. gdf(1,2,3) for multiple grouping keys) to make this case easier, but it did not get much support. But maybe we can reconsider it.

CC @nalimilan @pdeffebach @kdpsingh

alex-s-gardner commented 2 days ago

Ahh.. I can see the challenge. DimensionalData.jl uses At(1)... such that gdf[1] is the first group and gdf[At(1)] is the group with key value of 1... though I'm not sure how simpatico that is with DataFrames

bkamins commented 1 day ago

We could use At (or other such wrapper), but in this case it is longer to write At(1) than (1,). And this was a consideration why we did not introduce it yet.

alex-s-gardner commented 1 day ago

At(1) is longer but more intuitive than (1,), but I certainly see your point. gdf(1) seems like a good compromise