duckdblabs / db-benchmark

reproducible benchmark of database-like ops
https://duckdblabs.github.io/db-benchmark/
Mozilla Public License 2.0
136 stars 27 forks source link

datafusion does not correctly make chunk results #42

Closed Tmonster closed 9 months ago

Tmonster commented 9 months ago

tagging @Dandandan since they added the solution.

Thile in the process of running the benchmarks again and the output of data fusion has caused some issues during report generation.

Specifically, when a write_log is called and a chunk is written using chk=make_chk([chk]) the chunk output is incorrectly formatted.

Running the 500mb group by benchmark on G1_1e7_1e2_0_0 logs the following result for one of the queries

{{codename}},1696851068,1696851075.3744261,groupby,G1_1e7_1e2_0_0,10000000,sum v1 mean v3 by id3,100000,3,datafusion,31.0.0,,.groupby,1,0.269,1.877,TRUE,[29998789.          4999719.62234443],5.398,,FALSE

the column with the value [29998789. 4999719.62234443] is the offending column. It should contain semi-colon separated answers.

the report is ready to be published and I would like to have the datafusion results included

Dandandan commented 9 months ago

Having a quick look at the code I see some of the chk code using .to_pandas().to_numpy()[0] and some not.

Tmonster commented 9 months ago

fixed with https://github.com/duckdblabs/db-benchmark/pull/46

Dandandan commented 9 months ago

Thanks @Tmonster , amazing work :)