coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
34 stars 17 forks source link

TPCH - Why only queries 1-7? #1071

Open mrocklin opened 1 year ago

mrocklin commented 1 year ago

This is a genuine question, not a suggestion or a request for work. Why did we choose to focus on these seven queries. Are they special in some way?

@phofl I suspect that this question is mostly for you.

phofl commented 1 year ago

Queries 1-7 were available so I could just port them over without re-implementing anything.

It's on my todo list to look through the remaining queries to figure out if one of them stresses a different thing that we don't have covered yet

mrocklin commented 1 year ago

I can imagine this being helpful to see if we're over-tuning or not. If we find that we do really well on 1-7, but do really poorly on the rest, then that's a sign that our results aren't representative and that people shouldn't trust us.

On the other hand, if we implement the new queries and find that results are similar to what we've seen, then that's good evidence that the benchmarks we have are representative, at least to the class of queries represented by TPC-H.

phofl commented 1 year ago

I agree

mrocklin commented 1 year ago

@milesgranger if you have time tomorrow can I ask you to bring query 8 over from Polars into the system we have here?

phofl commented 1 year ago

I think @mrocklin also wanted a Dask implementation? That's what I understood when we chatted yesterday

mrocklin commented 1 year ago

Yeah, ideally we'd have coverage for all of the projects supported here, Dask, DuckDB, Polars, and Spark

On Tue, Oct 31, 2023, 5:37 AM Patrick Hoefler @.***> wrote:

I think @mrocklin https://github.com/mrocklin also wanted a Dask implementation? That's what I understood when we chatted yesterday

— Reply to this email directly, view it on GitHub https://github.com/coiled/benchmarks/issues/1071#issuecomment-1786950197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTDLRDR4F5QHJJIHG5LYCDIINAVCNFSM6AAAAAA6BJ7OC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBWHE2TAMJZG4 . You are receiving this because you were mentioned.Message ID: @.***>

milesgranger commented 1 year ago

bring query 8 over from Polars

I took that too literally. :sweat_smile: