Should add benchamrk option for p2p in test_q3 and test_q7 of h2o ?

coiled / benchmarks

BSD 3-Clause "New" or "Revised" License

35 stars 17 forks source link

Should add benchamrk option for p2p in test_q3 and test_q7 of h2o ? #660

Open ncclementi opened 1 year ago

ncclementi commented 1 year ago

Even though q3 and q7 do not shuffle by default, in the results shown over here https://github.com/coiled/h2o-benchmarks/issues/14#issuecomment-1297581487 we saw that when we have high cardinality on the column that the groupby is performed (in q3 and q7 that is id3) we see great benefits on using shuffling.

@hendrikmakait and I chatted, and it is not clear whether this should go on a benchmark or somewhere as a recommendation.

Using this issue as a placeholder to have the discussion.

cc: @fjetter @hayesgb

hendrikmakait commented 1 year ago

Looking at the plots, it seems as though the pattern in the workload is pretty similar. I think it might be good to test one of these (Q7 since it does a bit more?) with no shuffling, task-based shuffling and p2p. We should add a note highlighting the fact that we have seen significant improvements during analysis and thus test one query, but not all of them.

Let's just avoid a situation where the H2O tests become a grid search.

Much more important is the point about enabling recommendations or documenting the possible benefit of p2p for operations that do not shuffle per default (and identifying the relevant scenarios). I will incorporate this into the planned discussion issue.

ncclementi commented 1 year ago

Much more important is the point about enabling recommendations or documenting the possible benefit of p2p for operations that do not shuffle per default (and identifying the relevant scenarios). I will incorporate this into the planned discussion issue.

Agreed. I'll live this issue open as a reminder to include things in the discussion. Once we have that, we can close this one.

fjetter commented 1 year ago

I'm not too excited to run these benchmarks with custom config options. I think the nice thing about the h20 benchmarks is that they are very simple queries a user w/out a lot of low level knowledge would write. The knowledge that this can be sped up by shuffling is valuable, of course but I'd rather like us to see dask becoming smarter to make this call itself.

FWIW I don't have a strong opinion here.

hendrikmakait commented 1 year ago

The knowledge that this can be sped up by shuffling is valuable, of course but I'd rather like us to see dask becoming smarter to make this call itself.

Totally agreed! In the meantime, I still think that it would be worthwhile to have one test with a decomposable aggregation parametrized. Q7 seems to be a good low-effort candidate for that; we could write a targeted test for decomposable aggregation later on (and add some smarts to choosing the correct physical operation to distributed).