bluenote10 / NimData

DataFrame API written in Nim, enabling fast out-of-core data processing
MIT License
341 stars 22 forks source link

`groupBy` with a `DataFrame` #69

Open ynfle opened 3 years ago

ynfle commented 3 years ago

Is it possible to pass in a DataFrame that can be aligned/joined with the original Dataframe to allow for a list of values to group by?

Thanks for you wonderful package

bluenote10 commented 3 years ago

I'm not quite sure what you mean by "aligned/joined" in this context? Do you perhaps have a small example?

ynfle commented 3 years ago

I am following this to try and make a Naive Bayes Classifier in nim using NimData and in the method calc_prior they group the data by the target class which isn't possible without a join if the data and the target are separated.

ynfle commented 3 years ago

Here is a link to Pandas groupBy

EDIT: Forgot the link https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

bluenote10 commented 3 years ago

Sorry for the delay! So many things to do...

In general NimData already has a rudimentary (and not well documented) implementation of groupBy:

https://github.com/bluenote10/NimData/blob/ed07c2fa76cae57477d61b08384148f308aa4c6d/src/nimdata.nim#L245-L254

which isn't possible without a join if the data and the target are separated

Yes you'd probably have to combine the target column with the data columns first to make it work.