apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.12k stars 3.44k forks source link

[C++][Parquet] Support read by row ranges #39392

Open huberylee opened 7 months ago

huberylee commented 7 months ago

Describe the enhancement requested

FileReader supports reading data based on specified RowRanges to provide the most fundamental Filter pushdown capability to various upper-level computing engines. This implementation mainly consists of three parts:

According to the benchmark results, compared to scanning the entire column chunk, utilizing page pruning yields significant performance improvements. In scenarios with a low number of matched rows, single-column scans exhibit a performance boost of 1 to 30 times, while multi-column scans show an improvement of 10 to 16 times. However, in cases where a larger number of rows are matched, as the number of hit RowRanges increases, the scanning performance gradually deteriorates, potentially even experiencing performance regression. Here are some benchmark test results:

./build/relwithdebinfo/parquet-arrow-page-pruning-benchmark --benchmark_counters_tabular=true --benchmark_min_warmup_time=1 --benchmark_filter=BM_SingleColumn_NumPages
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                          Time             CPU   Iterations    HitRows  TotalPage items_per_second
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int32Type>/hit_page(%):10/hit_rows:1/iterations:50        63511 ns        63500 ns           50          1          8        15.748k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int32Type>/hit_page(%):30/hit_rows:1/iterations:50       134182 ns       134180 ns           50          3          8        22.358k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int32Type>/hit_page(%):50/hit_rows:1/iterations:50       215335 ns       211120 ns           50          4          8       18.9466k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int32Type>/hit_page(%):70/hit_rows:1/iterations:50       496982 ns       432440 ns           50          6          8       13.8748k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int32Type>/hit_page(%):100/hit_rows:1/iterations:50      600318 ns       481400 ns           50          8          8       16.6182k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<false,::arrow::Int32Type>/iterations:50                       1754878 ns      1666260 ns           50         2M          8       1.20029G/s
BM_SingleColumn_NumPages_ReadRowGroup<false,::arrow::Int32Type>/iterations:50                                1328994 ns      1329000 ns           50         2M          8       1.50489G/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int32Type>/hit_page(%):10/hit_rows:1/iterations:50        402561 ns       378580 ns           50          1          4       2.64145k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int32Type>/hit_page(%):30/hit_rows:1/iterations:50        389339 ns       389340 ns           50          2          4        5.1369k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int32Type>/hit_page(%):50/hit_rows:1/iterations:50       1056578 ns      1005800 ns           50          2          4       1.98847k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int32Type>/hit_page(%):70/hit_rows:1/iterations:50       1252089 ns      1021040 ns           50          3          4       2.93818k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int32Type>/hit_page(%):100/hit_rows:1/iterations:50      1146366 ns      1116160 ns           50          4          4       3.58372k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<true,::arrow::Int32Type>/iterations:50                        9214042 ns      8720260 ns           50         2M          4       229.351M/s
BM_SingleColumn_NumPages_ReadRowGroup<true,::arrow::Int32Type>/iterations:50                                 9003524 ns      8660100 ns           50         2M          4       230.944M/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int64Type>/hit_page(%):10/hit_rows:1/iterations:50       240033 ns       168320 ns           50          2         16       11.8821k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int64Type>/hit_page(%):30/hit_rows:1/iterations:50       309177 ns       273640 ns           50          5         16       18.2722k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int64Type>/hit_page(%):50/hit_rows:1/iterations:50       748207 ns       656960 ns           50          8         16       12.1773k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int64Type>/hit_page(%):70/hit_rows:1/iterations:50       530234 ns       529200 ns           50         12         16       22.6757k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::Int64Type>/hit_page(%):100/hit_rows:1/iterations:50      795477 ns       689480 ns           50         16         16       23.2059k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<false,::arrow::Int64Type>/iterations:50                       2168220 ns      1999420 ns           50         2M         16       1000.29M/s
BM_SingleColumn_NumPages_ReadRowGroup<false,::arrow::Int64Type>/iterations:50                                2220696 ns      2015920 ns           50         2M         16       992.103M/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int64Type>/hit_page(%):10/hit_rows:1/iterations:50        229509 ns       205660 ns           50          1          8       4.86239k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int64Type>/hit_page(%):30/hit_rows:1/iterations:50        844970 ns       826700 ns           50          3          8       3.62889k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int64Type>/hit_page(%):50/hit_rows:1/iterations:50        533323 ns       533320 ns           50          4          8       7.50019k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int64Type>/hit_page(%):70/hit_rows:1/iterations:50       1397721 ns      1262320 ns           50          6          8       4.75315k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::Int64Type>/hit_page(%):100/hit_rows:1/iterations:50      1734829 ns      1595120 ns           50          8          8        5.0153k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<true,::arrow::Int64Type>/iterations:50                        8717569 ns      8309240 ns           50         2M          8       240.696M/s
BM_SingleColumn_NumPages_ReadRowGroup<true,::arrow::Int64Type>/iterations:50                                 8686948 ns      8002060 ns           50         2M          8       249.936M/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::FloatType>/hit_page(%):10/hit_rows:1/iterations:50        89482 ns        82020 ns           50          1          8       12.1921k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::FloatType>/hit_page(%):30/hit_rows:1/iterations:50       273510 ns       188620 ns           50          3          8        15.905k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::FloatType>/hit_page(%):50/hit_rows:1/iterations:50       406551 ns       226700 ns           50          4          8       17.6445k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::FloatType>/hit_page(%):70/hit_rows:1/iterations:50       373658 ns       339440 ns           50          6          8       17.6762k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::FloatType>/hit_page(%):100/hit_rows:1/iterations:50      542894 ns       496540 ns           50          8          8       16.1115k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<false,::arrow::FloatType>/iterations:50                       1985082 ns      1883040 ns           50         2M          8       1062.11M/s
BM_SingleColumn_NumPages_ReadRowGroup<false,::arrow::FloatType>/iterations:50                                1785856 ns      1625300 ns           50         2M          8       1.23054G/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::FloatType>/hit_page(%):10/hit_rows:1/iterations:50        680770 ns       491100 ns           50          1          4       2.03625k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::FloatType>/hit_page(%):30/hit_rows:1/iterations:50       1005213 ns       999940 ns           50          2          4       2.00012k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::FloatType>/hit_page(%):50/hit_rows:1/iterations:50        702304 ns       646740 ns           50          2          4       3.09243k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::FloatType>/hit_page(%):70/hit_rows:1/iterations:50        740994 ns       547620 ns           50          3          4       5.47825k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::FloatType>/hit_page(%):100/hit_rows:1/iterations:50      1838819 ns      1731000 ns           50          4          4        2.3108k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<true,::arrow::FloatType>/iterations:50                        8214554 ns      8150740 ns           50         2M          4       245.376M/s
BM_SingleColumn_NumPages_ReadRowGroup<true,::arrow::FloatType>/iterations:50                                 9686748 ns      8991060 ns           50         2M          4       222.443M/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::DoubleType>/hit_page(%):10/hit_rows:1/iterations:50      128584 ns       128600 ns           50          2         16       15.5521k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::DoubleType>/hit_page(%):30/hit_rows:1/iterations:50      336180 ns       305060 ns           50          5         16       16.3902k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::DoubleType>/hit_page(%):50/hit_rows:1/iterations:50      373698 ns       373680 ns           50          8         16       21.4087k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::DoubleType>/hit_page(%):70/hit_rows:1/iterations:50      590868 ns       545020 ns           50         12         16       22.0175k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::DoubleType>/hit_page(%):100/hit_rows:1/iterations:50     704253 ns       704260 ns           50         16         16       22.7189k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<false,::arrow::DoubleType>/iterations:50                      1677933 ns      1676200 ns           50         2M         16       1.19318G/s
BM_SingleColumn_NumPages_ReadRowGroup<false,::arrow::DoubleType>/iterations:50                               1903368 ns      1815540 ns           50         2M         16        1.1016G/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::DoubleType>/hit_page(%):10/hit_rows:1/iterations:50       119272 ns       119240 ns           50          1          8       8.38645k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::DoubleType>/hit_page(%):30/hit_rows:1/iterations:50       438818 ns       437980 ns           50          3          8       6.84963k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::DoubleType>/hit_page(%):50/hit_rows:1/iterations:50       779543 ns       779540 ns           50          4          8       5.13123k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::DoubleType>/hit_page(%):70/hit_rows:1/iterations:50      1040031 ns      1040020 ns           50          6          8       5.76912k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::DoubleType>/hit_page(%):100/hit_rows:1/iterations:50     1367778 ns      1367420 ns           50          8          8       5.85043k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<true,::arrow::DoubleType>/iterations:50                       7576036 ns      7575580 ns           50         2M          8       264.006M/s
BM_SingleColumn_NumPages_ReadRowGroup<true,::arrow::DoubleType>/iterations:50                                7514859 ns      7496620 ns           50         2M          8       266.787M/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::StringType>/hit_page(%):10/hit_rows:1/iterations:50      298405 ns       298400 ns           50          3         27       10.0536k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::StringType>/hit_page(%):30/hit_rows:1/iterations:50     1201950 ns      1201960 ns           50          9         27       7.48777k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::StringType>/hit_page(%):50/hit_rows:1/iterations:50     1182310 ns      1182300 ns           50         14         27       11.8413k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::StringType>/hit_page(%):70/hit_rows:1/iterations:50     2323919 ns      2217760 ns           50         19         27        8.5672k/s
BM_SingleColumn_NumPages_PagePruning<false, ::arrow::StringType>/hit_page(%):100/hit_rows:1/iterations:50    2412017 ns      2412000 ns           50         27         27        11.194k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<false,::arrow::StringType>/iterations:50                     23347672 ns     23340280 ns           50         2M         27       85.6888M/s
BM_SingleColumn_NumPages_ReadRowGroup<false,::arrow::StringType>/iterations:50                              22411320 ns     22402560 ns           50         2M         27       89.2755M/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::StringType>/hit_page(%):10/hit_rows:1/iterations:50       282815 ns       282800 ns           50          2         14       7.07214k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::StringType>/hit_page(%):30/hit_rows:1/iterations:50      1002259 ns      1002220 ns           50          5         14       4.98892k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::StringType>/hit_page(%):50/hit_rows:1/iterations:50      1343057 ns      1342980 ns           50          7         14       5.21229k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::StringType>/hit_page(%):70/hit_rows:1/iterations:50      1687520 ns      1687520 ns           50         10         14       5.92586k/s
BM_SingleColumn_NumPages_PagePruning<true, ::arrow::StringType>/hit_page(%):100/hit_rows:1/iterations:50     1936757 ns      1930300 ns           50         14         14       7.25276k/s
BM_SingleColumn_NumPages_PagePruningWithHitAll<true,::arrow::StringType>/iterations:50                      16042260 ns     16013700 ns           50         2M         14       124.893M/s
BM_SingleColumn_NumPages_ReadRowGroup<true,::arrow::StringType>/iterations:50                               15811481 ns     15811240 ns           50         2M         14       126.492M/s

./build/relwithdebinfo/parquet-arrow-page-pruning-benchmark --benchmark_counters_tabular=true --benchmark_min_warmup_time=1 --benchmark_filter=BM_MultipleColumns
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                        Time             CPU   Iterations    HitRows MaxPageNum MinPageNum  TotalPage items_per_second
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_MultipleColumns_PagePruning/hit_page(%):5/hit_rows:1/iterations:50      6179859 ns      5986020 ns           50          2         31          1        283        334.112/s
BM_MultipleColumns_PagePruning/hit_page(%):10/hit_rows:1/iterations:50    11490077 ns     11259660 ns           50          4         31          1        283        355.251/s
BM_MultipleColumns_PagePruning/hit_page(%):30/hit_rows:1/iterations:50    17940528 ns     17802260 ns           50         10         31          1        283        561.726/s
BM_MultipleColumns_PagePruning/hit_page(%):50/hit_rows:1/iterations:50    25005303 ns     24890520 ns           50         16         31          1        283        642.815/s
BM_MultipleColumns_PagePruning/hit_page(%):70/hit_rows:1/iterations:50    30059767 ns     29288740 ns           50         22         31          1        283        751.142/s
BM_MultipleColumns_PagePruning/hit_page(%):100/hit_rows:1/iterations:50   32490276 ns     32479600 ns           50         31         31          1        283        954.445/s
BM_MultipleColumns_ReadRowGroup/iterations:50                            370906187 ns    370519020 ns           50         2M         31          1        283       5.39783M/s

./build/relwithdebinfo/parquet-arrow-page-pruning-benchmark --benchmark_counters_tabular=true --benchmark_min_warmup_time=1 --benchmark_filter=BM_SingleColumn_NumRanges
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                                   Time             CPU   Iterations    HitRows  TotalPage items_per_second
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:500/iterations:50         1487395 ns      1405880 ns           50       999k          8       710.587M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:1000/iterations:50        1795974 ns      1580800 ns           50       999k          8       631.959M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:10000/iterations:50       5383828 ns      5271540 ns           50       990k          8       187.801M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:100000/iterations:50     44106347 ns     42609500 ns           50   782.496k          8       18.3644M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:1000000/iterations:50    55134386 ns     53774440 ns           50      1000k          8       18.5962M/s
BM_SingleColumn_NumRanges_ReadRowGroup<false,::arrow::Int32Type>/iterations:50                                        1557364 ns      1541500 ns           50         2M          8       1.29744G/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:500/iterations:50          6483447 ns      6427080 ns           50     999.5k          4       155.514M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:1000/iterations:50         7577990 ns      7136140 ns           50       999k          4       139.992M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:10000/iterations:50       14655075 ns     14269100 ns           50       990k          4       69.3807M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:100000/iterations:50      52135581 ns     50977440 ns           50       800k          4       15.6932M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int32Type>/hit_page(%):100/hit_ranges:1000000/iterations:50     96627734 ns     94002220 ns           50      1000k          4        10.638M/s
BM_SingleColumn_NumRanges_ReadRowGroup<true,::arrow::Int32Type>/iterations:50                                         9284147 ns      9060400 ns           50         2M          4       220.741M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:500/iterations:50         2780413 ns      2699520 ns           50       999k         16       370.066M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:1000/iterations:50        3117969 ns      3103660 ns           50       991k         16         319.3M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:10000/iterations:50      10857759 ns     10618860 ns           50       910k         16       85.6966M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:100000/iterations:50     52281539 ns     51806180 ns           50      1000k         16       19.3027M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:1000000/iterations:50    53256162 ns     52542200 ns           50      1000k         16       19.0323M/s
BM_SingleColumn_NumRanges_ReadRowGroup<false,::arrow::Int64Type>/iterations:50                                        3036522 ns      3020580 ns           50         2M         16       662.124M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:500/iterations:50          7736525 ns      7564300 ns           50       999k          8       132.068M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:1000/iterations:50         9378083 ns      8917440 ns           50       999k          8       112.028M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:10000/iterations:50       19232368 ns     18976780 ns           50       990k          8        52.169M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:100000/iterations:50      70984532 ns     70568600 ns           50   782.496k          8       11.0884M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::Int64Type>/hit_page(%):100/hit_ranges:1000000/iterations:50     88546468 ns     88347420 ns           50      1000k          8       11.3189M/s
BM_SingleColumn_NumRanges_ReadRowGroup<true,::arrow::Int64Type>/iterations:50                                         8904477 ns      8872560 ns           50         2M          8       225.414M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::FloatType>/hit_page(%):100/hit_ranges:500/iterations:50         1342717 ns      1340940 ns           50       999k          8           745M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::FloatType>/hit_page(%):100/hit_ranges:1000/iterations:50        1480489 ns      1480480 ns           50       999k          8       674.781M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::FloatType>/hit_page(%):100/hit_ranges:10000/iterations:50       5134953 ns      5134960 ns           50       990k          8       192.796M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::FloatType>/hit_page(%):100/hit_ranges:100000/iterations:50     41475635 ns     41351500 ns           50   782.496k          8        18.923M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::FloatType>/hit_page(%):100/hit_ranges:1000000/iterations:50    49940678 ns     49919900 ns           50      1000k          8       20.0321M/s
BM_SingleColumn_NumRanges_ReadRowGroup<false,::arrow::FloatType>/iterations:50                                        1466148 ns      1466140 ns           50         2M          8       1.36413G/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::FloatType>/hit_page(%):100/hit_ranges:500/iterations:50          5810324 ns      5810360 ns           50     999.5k          4        172.02M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::FloatType>/hit_page(%):100/hit_ranges:1000/iterations:50         6257747 ns      6257740 ns           50       999k          4       159.642M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::FloatType>/hit_page(%):100/hit_ranges:10000/iterations:50       13449354 ns     13445280 ns           50       990k          4       73.6318M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::FloatType>/hit_page(%):100/hit_ranges:100000/iterations:50      44484917 ns     44484920 ns           50       800k          4       17.9836M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::FloatType>/hit_page(%):100/hit_ranges:1000000/iterations:50     87668032 ns     87665320 ns           50      1000k          4        11.407M/s
BM_SingleColumn_NumRanges_ReadRowGroup<true,::arrow::FloatType>/iterations:50                                         8244576 ns      8243860 ns           50         2M          4       242.605M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:500/iterations:50        2941689 ns      2926600 ns           50       999k         16       341.352M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:1000/iterations:50       2758582 ns      2758600 ns           50       991k         16        359.24M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:10000/iterations:50     10177306 ns     10174800 ns           50       910k         16       89.4366M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:100000/iterations:50    49739064 ns     49700420 ns           50      1000k         16       20.1206M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:1000000/iterations:50   52075806 ns     50505760 ns           50      1000k         16       19.7997M/s
BM_SingleColumn_NumRanges_ReadRowGroup<false,::arrow::DoubleType>/iterations:50                                       2653357 ns      2653340 ns           50         2M         16       753.767M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:500/iterations:50         6847587 ns      6795380 ns           50       999k          8       147.012M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:1000/iterations:50        8701575 ns      8615520 ns           50       999k          8       115.954M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:10000/iterations:50      18250828 ns     18235140 ns           50       990k          8       54.2908M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:100000/iterations:50     72778973 ns     70897040 ns           50   782.496k          8       11.0371M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::DoubleType>/hit_page(%):100/hit_ranges:1000000/iterations:50    89796725 ns     88411400 ns           50      1000k          8       11.3108M/s
BM_SingleColumn_NumRanges_ReadRowGroup<true,::arrow::DoubleType>/iterations:50                                        8722744 ns      8717700 ns           50         2M          8       229.418M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::StringType>/hit_page(%):100/hit_ranges:500/iterations:50       14747333 ns     14737380 ns           50     989.5k         27       67.1422M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::StringType>/hit_page(%):100/hit_ranges:1000/iterations:50      15420189 ns     15417820 ns           50       976k         27       63.3034M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::StringType>/hit_page(%):100/hit_ranges:10000/iterations:50     26830887 ns     26824880 ns           50       790k         27       29.4503M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::StringType>/hit_page(%):100/hit_ranges:100000/iterations:50    74758644 ns     74637000 ns           50      1000k         27       13.3982M/s
BM_SingleColumn_NumRanges_PagePruning<false, ::arrow::StringType>/hit_page(%):100/hit_ranges:1000000/iterations:50   74697486 ns     74693320 ns           50      1000k         27       13.3881M/s
BM_SingleColumn_NumRanges_ReadRowGroup<false,::arrow::StringType>/iterations:50                                      24560657 ns     24549260 ns           50         2M         27       81.4689M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::StringType>/hit_page(%):100/hit_ranges:500/iterations:50        29453166 ns     29451740 ns           50     996.5k         14        33.835M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::StringType>/hit_page(%):100/hit_ranges:1000/iterations:50       47641184 ns     47614640 ns           50       996k         14       20.9179M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::StringType>/hit_page(%):100/hit_ranges:10000/iterations:50     325770038 ns    325471820 ns           50       930k         14       2.85739M/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::StringType>/hit_page(%):100/hit_ranges:100000/iterations:50   2429500345 ns   2426462860 ns           50      1000k         14       412.123k/s
BM_SingleColumn_NumRanges_PagePruning<true, ::arrow::StringType>/hit_page(%):100/hit_ranges:1000000/iterations:50  5.1448e+11 ns   3944426420 ns           50      1000k         14       253.522k/s
BM_SingleColumn_NumRanges_ReadRowGroup<true,::arrow::StringType>/iterations:50                                       16679447 ns     16597600 ns           50         2M         14       120.499M/s

Please kindly request the community's assistance in reviewing and determining whether it can be merged into the community. If needed, I can split the Merge Request into multiple ones. Thank you!

Component(s)

C++, Parquet

emkornfield commented 7 months ago

This seems like mostly a dupe of https://github.com/apache/arrow/issues/38865 ?