Open lawben opened 1 year ago
@oerling
Is this data being generated using Velox's TPCHConnector? If so, we use a different buffer size for string generation in dbgen, so the data generated won't be exactly equal to the data generated using DuckDB (but not sure if the issue is dbgen or actual query execution here).
@pedroerp Yes, the data is being generated with the TPCHConnector (see #4091 for current version). But I'm using the same data in DuckDB, not their data. However, when using the same generator for SF1, we get the same results as DuckDB. Is there a difference between SF1 and 10 for the buffers?
Is there a difference between SF1 and 10 for the buffers?
Not that I know of. The only difference I'm aware of is that the buffers dbgen uses to generate synthetic strings are smaller in Velox (to make them faster to initialize), so strings generated between Velox and DuckDB will be different (regardless of the scale factor).
The issue you're reporting seems unrelated; looks like it might be an actual bug.
Bug description
While benchmarking TPC-H with Parquet input, I came across weird performance numbers for Q21 with SF10. When looking into it, I found out that the results are incorrect.
In the lowest
TableScan
in the logs below, we only have "Raw Input: 3229839 rows", which is only 5% of the data. At SF10, we have ~60 million rows inlineitem
.For comparison, I also ran this Q21 with DuckDB on the same Parquet files, and get other results. See logs below for DuckDB output. Basically, the main difference can be seen in the first 5 rows of the output (data is sorted by the aggregation value DESC):
For SF1, Velox and DuckDB produce the same results.
I also encountered this issue on my M1 dev machine.
System information
Velox System Info v0.0.2 Commit: 0e230792d9fb681e9756954eef1bf8dc0a87c10f CMake Version: 3.22.1 System: Linux-5.15.0-58-generic Arch: x86_64 C++ Compiler: /usr/bin/clang++-15 C++ Compiler Version: 15.0.6 C Compiler:/usr/bin/clang-15 C Compiler Version: 15.0.6 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs