Open mrocklin opened 1 year ago
We're already doing this for snowflake.
test_tpch.py
files starts with pytest.importorskip
to avoid crashing the test suite when your dependency isn't installedci/environment-snowflake.yml
include
section of .github/workflows/tests.yml
:
https://github.com/coiled/benchmarks/blob/ebffe0fabe50a9ed6aa514705a01ec482a72934e/.github/workflows/tests.yml#L59-L62This said - what are the dependencies of tpch? I'm only seeing dask-expr which is a small dependency-free package, isn't it? What's the benefit of splitting it out?
EDIT - nvm, I've now seen the most recent test pack and its dependencies
Using a monolithic environment.yml
is chiefly very convenient for A/B tests. Otherwise you'd have to prepare and run a separate suite of A/B tests for each set of base environments.
Right now we're putting everything in one environment.yml file. Maybe this isn't wise? I'm thinking of creating separate environment files for TPC-H vs normal benchmarks.
More broadly, I think that we put TPC-H in this repository for convenience (all the CI infrastructure is here). Do we also want these queries for normal benchmarks, or should they stay separate?
Putting my marketing hat on, I like the idea of them being separate. I hope to point people at some
tpch
directory with its own very simple instructions and have them understand it easily. Seeing a bunch of other stuff in there might confuse things.I think I want feedback from @fjetter and @crusaderky here, who I think have the most context.