JuliaData / SplitApplyCombine.jl

Split-apply-combine strategies for Julia
Other
144 stars 15 forks source link

groupby taking a vector #20

Closed mkborregaard closed 4 years ago

mkborregaard commented 4 years ago

group only takes a function, like iseven. But in many cases (like DataFrames.groupby) you want to split on a vector of groups.

andyferris commented 4 years ago

Could you please post a fuller example of the kind of behaviour you would like to see, for me?

andyferris commented 4 years ago

So on master, you can now do this:

julia> group([true,true,true,false,false], [1,2,3,4,5])
2-element Dictionaries.HashDictionary{Bool,Array{Int64,1}}
 false │ [4, 5]
  true │ [1, 2, 3]

Is this what you expected? It's also super useful in the context of a dataframe, like group(df.col1, df.col2).

That returns a (EDIT) dictionary of vectors, not a dataframe, though I imagine we could make group(df.col1, df) split up the table nicely (it already almost works for TypedTables). Furthermore, as sneak preview, I'm thinking of a partition function as a flattened version of group that works well with tables/dataframes, getting closer to a SQL and DataFrames.jl "group by"). (I need to do slightly more work on DIctionaries.jl before circling around to tables again though...)

mkborregaard commented 4 years ago

Oh, sorry I didn't see the comment above! Yes, that is exactly what I expected 💖 especially if it also would work for vectors of Ints, or Strings or any CategoricalArray, in addition to Bools? Your example on DataFrames sounds exactly like DataFrames.groupby, right?

andyferris commented 4 years ago

Cool. Yes it works for all types.

group always returns a nested dictionary, whereas I thought groupby flattened out everything into a single table? Or am I mistaken?

mkborregaard commented 4 years ago

Ah, yes you are right.

andyferris commented 4 years ago

OK this seems resolved.

@JuliaRegistrator register

JuliaRegistrator commented 4 years ago

Registration pull request created: JuliaRegistries/General/7136

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if Julia TagBot is installed, or can be done manually through the github interface, or via:

git tag -a v1.0.0 -m "<description of version>" 9696a969252e5358c1f89e572ae8ba6ec590da64
git push origin v1.0.0
andyferris commented 4 years ago

Done :)