Closed Shekharrajak closed 7 years ago
There is still many works to do in this PR. But it will be good if someone review the approach.
Behaviour on multiple grouping after this commit https://github.com/SciRuby/daru/pull/330/commits/5dcca795868d6dde4936bfb2b0b6846a75c00b22
# using above example only
irb(main):007:0> df.group_by([:employee, :month])
=> #<Daru::DataFrame(6x1)>
salary
Jane July 4 600
June 1 500
John July 3 1200
June 0 1000
Mark July 5 600
June 2 700
#------
irb(main):009:0> df.group_by([:employee, :month, :salary])
=> #<Daru::DataFrame(6x0)>
Jane July 600 4
June 500 1
John July 1200 3
June 1000 0
Mark July 600 5
June 700 2
Sorry for delay, busy times. Reviewing it currently.
I hope now I have completed the PR. @zverok , please have a look. Thanks.
Now reviewing it FOR REALZ :)
Okay, no problem :)
Well, first of all I'd like to know what is your further plan for following the GitHub issue? Because dataframe.group_by(...).df
looks weird for me, I'd like to know whether it is some "temporary" solution, or you believe it is the best API possible for the task? From an answer to this question our further actions depend.
@Shekharrajak I've asked:
Well, first of all I'd like to know what is your further plan for following the GitHub issue?
I really need an answer for further discussion, not for "examining" you or something. We need some common plan, the issue is really important and your work could have a great impact!
@zverok , I haven't think too much but when I opened this PR, I tried to keep in mind that this is just part 1, there is other part (DataFrame#summarize ) also to fix.
I understand that dataframe.group_by(...).df
looks weird. But I am here just showing the resultant dataframe for the better user experience. User still can do other group_by operations like count, mean
etc. since group_by
returns the GroupBy
class(it is just showing dataframe). So it take care the old testcases.
irb(main):006:0> df.group_by(:employee)
=> #<Daru::DataFrame(6x2)>
month salary
Jane 1 June 500
4 July 600
John 0 June 1000
3 July 1200
Mark 2 June 700
5 July 600
irb(main):007:0> df.group_by(:employee).class
=> Daru::Core::GroupBy
For Part 2: Add DataFrame#summarize
for multi-indexed DFs. I have thought to use this resultant dataframe for the summarize
method.
I also need to understand why many testcases are sorted (https://github.com/SciRuby/daru/issues/324) . When I remove the sorting I get error mostly just ordering is changed.
OK, let's do it this way: I'll merge current work in master in current state, with expectation that you'll do "part 2" too, and we rule out all doubtful moments at that point, OK?
@zverok Okay, I will make this PR better. I will do few more commits soon. Thanks.
@Shekharrajak no, you've missed my idea :) Let's merge this PR just now, and then do a separate PR with "part 2" of the issue -- and spend some time on that second with cleaning up. WDYT?
@zverok , fine. I will open new PR after few days with the proper code.
:+1: Merged this one, then. Thanks.
Fixes #152 Part 1
Example :