coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

Adjust TPC-H data generation script to create a single `.parquet` files for `nation` and `region` #1386

Closed hendrikmakait closed 7 months ago

hendrikmakait commented 7 months ago

As discussed in #1380, partitioning these two datasets is unrealistic for this particular workload.

As a workaround, I have manually replaced the partitioned data with unpartitioned data for the Arrow-based datasets. We need to adjust the data generation scripts to make the benchmarks reproducible for others (including our future selves).