Closed XiangpengHao closed 1 month ago
Thank you for this, I'm sure you're aware and what you're trying to empirically demonstrate, but RowSelection is not designed for highly non-contiguous, e.g. random selections. It might be worth adding some benchmarks of long contiguous selections, as might arise when filtering sorted data
🫶
but RowSelection is not designed for highly non-contiguous, e.g. random selections.
yes, I think this is what @XiangpengHao is considering improving
It might be worth adding some benchmarks of long contiguous selections, as might arise when filtering sorted data
I agree adding benchmarks for the case where RowSelection already does well would be valuable (to ensure we don't introduce regressions)
Which issue does this PR close?
Part of #5523
Rationale for this change
As the first step of measure-then-build, we add some benchmarks.
The benchmark has 300_000 rows, and the selector will select 1/3 of the rows, this roughly matches with the
SearchPhase <> ''
predicate in many ClickBench queries.I added
intersection
,union
,from_filters
andand_then
because they are the most pronounced ones in the flamegraph.What changes are included in this PR?
Are there any user-facing changes?