SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

Refactor grouping and aggregation #454

Closed paisible-wanderer closed 6 years ago

paisible-wanderer commented 6 years ago

PR for Issue 453

It might be easier to review commit by commit

I think that the main contribution is the methodDaru::DateFrame#group_by_and_aggregate, here are some benchmarks: (nowhere extensive)

1/ comparaison of groupping + aggregation - on my PR

require 'benchmark'

Benchmark.bm(8) do |b|
  b.report("gr + sum") { 1_000.times { spending_df.group_by(:year).sum } }
  b.report("gr + agg") { 1_000.times { spending_df.group_by(:year).aggregate(spending: :sum, nb_spending: :sum) } }
  b.report("gr_agg")   { 1_000.times { spending_df.group_by_and_aggregate(:year, spending: :sum, nb_spending: :sum) } }
end

#                user     system      total        real
# gr + sum   0.680000   0.000000   0.680000 (  0.673169)
# gr + agg   0.870000   0.000000   0.870000 (  0.871802)
# gr_agg     0.370000   0.000000   0.370000 (  0.370445)

2/ I end up refactoring the groupping method (will be a bit slower)

## test using `10_000.times { spending_df.group_by(:year).sum }` on first and last commit
#               user     system      total        real
# before    6.270000   0.000000   6.270000 (  6.302841)
# after     6.620000   0.000000   6.620000 (  6.620660)

=========================

Please, let me know if there is anything I can improve.

zverok commented 6 years ago

Thanks! And sorry for long wait.

Prakriti-nith commented 6 years ago

There was a method access_row_tuples_by_indexs in dataframe.rb which has been recently removed from daru. This method has been used extensively in daru-view, so it is creating problems in the visualization with GoogleCharts and HighCharts when data is provided as Daru::DataFrame. @zverok @paisible-wanderer is there any alternative method provided or should we add it again?

paisible-wanderer commented 6 years ago

@zverok no problem, I am happy that it was finally merged!