Use 50 GB Parquet+PyArrow dataset for H2O tests on CI

coiled / benchmarks

BSD 3-Clause "New" or "Revised" License

32 stars 17 forks source link

Closed hendrikmakait closed 2 months ago

hendrikmakait commented 3 months ago

Similar to #1530, the 5 GB dataset feels too small to benchmark behavior we care about.

fjetter commented 3 months ago

how long is one test run (compared to the rest of the suite)?

hendrikmakait commented 3 months ago

how long is one test run (compared to the rest of the suite)?

I'll have to convert this back to draft, it looks like some of the workloads run OOM.

hendrikmakait commented 2 months ago

Some of these workloads cause workers to OOM, so I'm shelving this for now and will look into it at a later point.