As discussed in #1380, partitioning these two datasets is unrealistic for this particular workload.
As a workaround, I have manually replaced the partitioned data with unpartitioned data for the Arrow-based datasets. We need to adjust the data generation scripts to make the benchmarks reproducible for others (including our future selves).
As discussed in #1380, partitioning these two datasets is unrealistic for this particular workload.
As a workaround, I have manually replaced the partitioned data with unpartitioned data for the Arrow-based datasets. We need to adjust the data generation scripts to make the benchmarks reproducible for others (including our future selves).