duckdblabs / db-benchmark

reproducible benchmark of database-like ops
https://duckdblabs.github.io/db-benchmark/
Mozilla Public License 2.0
136 stars 27 forks source link

Published duckdb results are not reproducible #65

Open qoega opened 8 months ago

qoega commented 8 months ago

Hi. I created environment you use for benchmarks and tried to reproduce current published results.

curl http://169.254.169.254/latest/meta-data/instance-type
c6id.metal

Local disk with benchmark data is stored on local nvme disk

~/nvme/h2oai-db-benchmark$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme2n1    1.8T  510G  1.3T  29% /home/ubuntu/nvme

lsblk | grep -v loop
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
nvme4n1      259:0    0  1000G  0 disk
├─nvme4n1p1  259:1    0 999.9G  0 part /
├─nvme4n1p14 259:2    0     4M  0 part
└─nvme4n1p15 259:3    0   106M  0 part /boot/efi
nvme2n1      259:4    0   1.7T  0 disk /home/ubuntu/nvme
nvme3n1      259:5    0   1.7T  0 disk
nvme1n1      259:6    0   1.7T  0 disk /var/lib/clickhouse
nvme0n1      259:7    0   1.7T  0 disk /nvme

Group by G1_1e9_1e2_5_0 fails with OOM for duckdb 0.8.1.3

cat run_duckdb_groupby_G1_1e9_1e2_5_0.err
Error: rapi_execute: Failed to run query
Error: Out of Memory Error: could not allocate block of size 262KB (216.2GB/216.2GB used)
Database is launched in in-memory mode and no temporary directory is specified.
Unused blocks cannot be offloaded to disk.

Launch the database with a persistent storage back-end
Or set PRAGMA temp_directory='/path/to/tmp.tmp'
Timing stopped at: 768 538.4 33.28
Execution halted
Warning messages:
1: Connection is garbage-collected, use dbDisconnect() to avoid this.
2: Database is garbage-collected, use dbDisconnect(con, shutdown=TRUE) or duckdb::duckdb_shutdown(drv) to avoid this.
jangorecki commented 8 months ago

You need to use the same version of duckdb if you want to reproduce.

sorry, I expected benchmark runs on 0.9.0 and latest 0.9.1

Tmonster commented 8 months ago

The current published results also have 0.8.1-3 erroring out on the dataset G1_1e9_1e2_5_0 (at least for the advanced questions). You can see the results here

https://duckdblabs.github.io/db-benchmark/groupby/G1_1e9_1e2_5_0_advanced.png

Can you also post the .out file? That can tell you what specific query failed.