duckdblabs / db-benchmark

reproducible benchmark of database-like ops
https://duckdblabs.github.io/db-benchmark/
Mozilla Public License 2.0
136 stars 27 forks source link

Add cuDF #63

Closed jmakov closed 8 months ago

jmakov commented 8 months ago

Any particular reason why cuDF is left out especially since it's included in the original repo?

Tmonster commented 8 months ago

Two reasons,

  1. Including libraries that use separate hardware is not a good way to compare the performance of software libraries. We would like to maintain hardware parity across the solutinos.
  2. Maintaining the GPU accelerated solutions is also not something I am keen to do.

If cuDF can run exclusively on CPUs then I will happily include it again if you provide a PR to fix any current problems

jmakov commented 8 months ago

Thanks for the clarification. I guess from the user's perspective it would be interesting to see what's possible e.g. should you invest in a CPU with AVX512 or rather a GPU. I agree supporting a GPU env even with Docker requires a lot.

rootsmusic commented 3 months ago

RAPIDS performed a Database-like ops benchmark with a CPU and GPU for cudf.pandas.

jangorecki commented 3 months ago

RAPIDS performed a Database-like ops benchmark with a CPU and GPU for cudf.pandas.

Thanks for link. After looking closer the time there doesn't look very reliable. Both h2oai and duckdb run q10 groupby data.table in 12s (recent) and 17s (old), while cudf benchmark of data.table needs 34s for same operation. Don't know what can be the reason, machine spec, compiler, compilation flags?