apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.33k stars 1.2k forks source link

Chunk based iteration in `accumulate_indices` #13451

Closed jayzhan211 closed 2 days ago

jayzhan211 commented 5 days ago

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jayzhan211 commented 5 days ago
Comparing main and filter-chunk
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ filter-chunk ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │    0.26ms │       0.30ms │  1.16x slower │
│ QQuery 1     │   26.00ms │      25.44ms │     no change │
│ QQuery 2     │   55.35ms │      53.21ms │     no change │
│ QQuery 3     │   46.89ms │      45.90ms │     no change │
│ QQuery 4     │  307.76ms │     309.23ms │     no change │
│ QQuery 5     │  574.82ms │     570.49ms │     no change │
│ QQuery 6     │   26.30ms │      26.27ms │     no change │
│ QQuery 7     │   27.54ms │      29.15ms │  1.06x slower │
│ QQuery 8     │  356.26ms │     362.77ms │     no change │
│ QQuery 9     │  546.82ms │     557.03ms │     no change │
│ QQuery 10    │  131.44ms │     141.31ms │  1.08x slower │
│ QQuery 11    │  150.23ms │     152.91ms │     no change │
│ QQuery 12    │  614.05ms │     610.26ms │     no change │
│ QQuery 13    │  706.93ms │     688.08ms │     no change │
│ QQuery 14    │  566.77ms │     547.21ms │     no change │
│ QQuery 15    │  388.56ms │     383.52ms │     no change │
│ QQuery 16    │  932.13ms │     941.68ms │     no change │
│ QQuery 17    │  850.16ms │     838.91ms │     no change │
│ QQuery 18    │ 2798.37ms │    2790.18ms │     no change │
│ QQuery 19    │   39.54ms │      39.65ms │     no change │
│ QQuery 20    │  707.81ms │     702.99ms │     no change │
│ QQuery 21    │  971.53ms │     958.09ms │     no change │
│ QQuery 22    │ 2687.40ms │    2676.42ms │     no change │
│ QQuery 23    │ 5810.42ms │    5824.84ms │     no change │
│ QQuery 24    │  410.33ms │     385.74ms │ +1.06x faster │
│ QQuery 25    │  396.73ms │     405.53ms │     no change │
│ QQuery 26    │  448.47ms │     450.85ms │     no change │
│ QQuery 27    │ 1074.77ms │    1038.62ms │     no change │
│ QQuery 28    │ 8169.84ms │    7937.72ms │     no change │
│ QQuery 29    │  348.52ms │     355.17ms │     no change │
│ QQuery 30    │  520.21ms │     501.82ms │     no change │
│ QQuery 31    │  528.27ms │     525.37ms │     no change │
│ QQuery 32    │ 2780.65ms │    2652.40ms │     no change │
│ QQuery 33    │ 2678.26ms │    2671.76ms │     no change │
│ QQuery 34    │ 2572.40ms │    2586.40ms │     no change │
│ QQuery 35    │  526.14ms │     529.53ms │     no change │
│ QQuery 36    │  115.23ms │     112.60ms │     no change │
│ QQuery 37    │   77.91ms │      79.28ms │     no change │
│ QQuery 38    │   81.69ms │      82.17ms │     no change │
│ QQuery 39    │  203.75ms │     207.33ms │     no change │
│ QQuery 40    │   23.99ms │      24.14ms │     no change │
│ QQuery 41    │   22.93ms │      23.93ms │     no change │
│ QQuery 42    │   28.73ms │      28.82ms │     no change │
└──────────────┴───────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)           │ 40332.13ms │
│ Total Time (filter-chunk)   │ 39875.01ms │
│ Average Time (main)         │   937.96ms │
│ Average Time (filter-chunk) │   927.33ms │
│ Queries Faster              │          1 │
│ Queries Slower              │          3 │
│ Queries with No Change      │         39 │
└─────────────────────────────┴────────────┘
Dandandan commented 5 days ago

I wonder if any of the queries in that benchmark uses aggregate filters?

jayzhan211 commented 5 days ago

Neither Q7, Q10, nor Q24 uses an aggregate filter.

Dandandan commented 5 days ago

Can we maybe find / create a query with filter? Maybe add one in clickbench_extended?

jayzhan211 commented 4 days ago

Can we maybe find / create a query with filter? Maybe add one in clickbench_extended?

aggregate filter required postgres dialect, it might be easier to run benchmark with criterion

jayzhan211 commented 2 days ago

@Dandandan PTAL

jayzhan211 commented 2 days ago

Thanks @Dandandan