coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
32 stars 17 forks source link

Refactor TPC-H generation script - consolidate small table partitions #1392

Closed milesgranger closed 8 months ago

milesgranger commented 9 months ago

Will close #1386

It's made slightly more complex b/c duckdb doesn't support generating data for a single table. So we can keep things mostly the same but just not write out the partitions but collect and combine them at the end of the normal partition generation.

Also made some other adjustments small refactoring, including using a localcluster to generate the data locally to more closely replicate the behavior when using Coiled. Plus it goes faster generating larger scales. :)

milesgranger commented 9 months ago

Ran on coiled, saw some workers being killed for memory so adjusted worker sizing a bit. Confirm it works there too otherwise, scale 100 w/ single partition of nation and region at s3://coiled-runtime-ci/milesg/tpch/scale-100