duckdblabs / db-benchmark

reproducible benchmark of database-like ops
https://duckdblabs.github.io/db-benchmark/
Mozilla Public License 2.0
136 stars 27 forks source link

Dask: Enable Q6 to Q10 again #58

Closed fjetter closed 1 month ago

fjetter commented 8 months ago

This does not include any changes to the existing code (with the exception of the removed config option that is ignored by dataframes anyhow) but is simply adding the Q7-Q10 queries again as they are defined right now. I haven't optimized anything here but dask is perfectly capable of running those.

Tmonster commented 8 months ago

Hi Florian,

I've kicked off the workflow run for now to make sure everything works. To get the PR merged and the results updated quickly I would also like to see updates to the time.csv and logs.csv files. This way I know the code has been tested thoroughly up to 50GB. By running the benchmark yourself you can also generate the report to see how dask compares to other solutions.

You can also modify configs so that dask spawns some different combination of workers & threads. I saw on this comment that that might be the issue https://github.com/duckdblabs/db-benchmark/issues/56#issuecomment-1798255468

fjetter commented 8 months ago

I wanted to follow up with some improvements but we can of course all do in one go.

To get the PR merged and the results updated quickly I would also like to see updates to the time.csv and logs.csv files.

Looks like I didn't read the readme properly. I assumed you would be running the benchmarks. I'll look into it and update the numbers.

Tmonster commented 8 months ago

Whoops, looks like dask isn't even a solution in the regression.yml file. Can you add it in this PR? Then it will get automatically tested as well.

Edit: I manually cancelled earlier the workflows since dask wasn't included. They should automatically run again when you push

Tmonster commented 8 months ago

@fjetter seems like there is an issue with the dask group by

Tmonster commented 8 months ago

Hi Florian,

I did some extra debugging here and found other changes that needed to be made to get dask to run. If you merge with master all github actions should pass

Tmonster commented 1 month ago

@fjetter Hi florian, with the release of DuckDB v1.0.0 I'm gonna run the benchmark again. I tried to resolve the merge conflicts for Dask. Let me know if there's anything else I need to do

Currently waiting for CI to pass first

fjetter commented 1 month ago

If CI passes I think you're good.