EthanRosenthal / medium-data-bakeoff

A python library bakeoff for medium sized datasets
MIT License
23 stars 7 forks source link

Adding modin, dask_on_ray, and modin_on_dask #15

Closed sullivancolin closed 1 year ago

sullivancolin commented 1 year ago

Adding benchmarks for modin and backend variations. I'm still using the column projection so including the slightly optimized variation.

I haven't attempted modin on other versions of the partitions, but my guess is that it will fail on many. Possibly groupby.mean is not very performant in modin, and I was surprised by how slow it was.

benchmark_50

closes #12

EthanRosenthal commented 1 year ago

Sorry for the delay -- this is awesome! I'll go ahead and run the benchmarks and update the plots after merging this.