apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.7k stars 3.34k forks source link

GH-41702: [C++][Parquet] Thrift: generate template method to accelerate reading thrift #41703

Closed mapleFU closed 1 week ago

mapleFU commented 2 weeks ago

Rationale for this change

By default, the Thrift serializer and deserializer call many virtual functions. However, the Thrift C++ compiler has an option to generate template methods that does away with the cost of calling virtual functions. It seems to make the metadata read/write benchmarks around 10% faster.

What changes are included in this PR?

  1. cpp/build-support/update-thrift.sh: enable templates option to Thirft C++ compilerargument
  2. cpp/src/parquet/thrift_internal.h: use generated code
  3. cpp/src/generated: update generated files.

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

github-actions[bot] commented 2 weeks ago

:warning: GitHub issue #41702 has been automatically assigned in GitHub to PR creator.

mapleFU commented 2 weeks ago

@emkornfield @pitrou I've update a patching here. This generated call less virtual functions during deserializing. Would you mind take a look?

I'm not so familiar with thrift compiler, maybe more useful tools can help deserializing

pitrou commented 2 weeks ago

@mapleFU I didn't know this was possible. This looks neat in the principle. Did you try to run some benchmark?

mapleFU commented 2 weeks ago

Run in page index: https://github.com/apache/arrow/issues/41702#issuecomment-2116873657

For footer it's more useful since readVirt is called for more times

wgtmac commented 2 weeks ago

I remember there was about 3% speedup reading a sample parquet file.

pitrou commented 1 week ago

Perhaps you can try with the additional benchmarks in https://github.com/apache/arrow/pull/41761

mapleFU commented 1 week ago

On my M1 Pro with Release(O2):

After:

Run on (10 X 24.0711 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 7.98, 10.79, 8.83
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
WriteMetadata/num_columns:1/num_row_groups:1            10248 ns        10198 ns        65596 file_size=459 items_per_second=98.0618k/s
WriteMetadata/num_columns:1/num_row_groups:100         708873 ns       701642 ns         1003 file_size=37.383k items_per_second=1.42523k/s
WriteMetadata/num_columns:1/num_row_groups:1000       7027939 ns      7022677 ns           99 file_size=374.885k items_per_second=142.396/s
WriteMetadata/num_columns:10/num_row_groups:1           78750 ns        78709 ns         8900 file_size=3.762k items_per_second=12.705k/s
WriteMetadata/num_columns:10/num_row_groups:100       6751510 ns      6644838 ns          105 file_size=358.835k items_per_second=150.493/s
WriteMetadata/num_columns:10/num_row_groups:1000     67659713 ns     67142800 ns           10 file_size=3.614M items_per_second=14.8936/s
WriteMetadata/num_columns:100/num_row_groups:1         787280 ns       771871 ns          910 file_size=37.352k items_per_second=1.29555k/s
WriteMetadata/num_columns:100/num_row_groups:100     66632500 ns     66540000 ns           10 file_size=3.61693M items_per_second=15.0286/s
WriteMetadata/num_columns:100/num_row_groups:1000   703455917 ns    699385000 ns            1 file_size=36.2887M items_per_second=1.42983/s
WriteMetadata/num_columns:1000/num_row_groups:1       8089713 ns      8087153 ns           85 file_size=376.655k items_per_second=123.653/s
WriteMetadata/num_columns:1000/num_row_groups:100   705972459 ns    702311000 ns            1 file_size=36.4815M items_per_second=1.42387/s
WriteMetadata/num_columns:10000/num_row_groups:1     82793505 ns     82773750 ns            8 file_size=3.82213M items_per_second=12.0811/s
WriteMetadata/num_columns:10000/num_row_groups:100 7789295000 ns   7492551000 ns            1 file_size=369.089M items_per_second=0.133466/s
ReadMetadata/num_columns:1/num_row_groups:1              3022 ns         3021 ns       229889 file_size=459 items_per_second=330.982k/s
ReadMetadata/num_columns:1/num_row_groups:100           59165 ns        59139 ns        11742 file_size=37.383k items_per_second=16.9092k/s
ReadMetadata/num_columns:1/num_row_groups:1000         587111 ns       586972 ns         1189 file_size=374.885k items_per_second=1.70366k/s
ReadMetadata/num_columns:10/num_row_groups:1            13977 ns        13973 ns        50402 file_size=3.762k items_per_second=71.569k/s
ReadMetadata/num_columns:10/num_row_groups:100         475674 ns       475562 ns         1469 file_size=358.835k items_per_second=2.10278k/s
ReadMetadata/num_columns:10/num_row_groups:1000       4743075 ns      4742237 ns          139 file_size=3.614M items_per_second=210.871/s
ReadMetadata/num_columns:100/num_row_groups:1          119355 ns       119308 ns         5747 file_size=37.352k items_per_second=8.38169k/s
ReadMetadata/num_columns:100/num_row_groups:100       5379931 ns      5378835 ns          133 file_size=3.61693M items_per_second=185.914/s
ReadMetadata/num_columns:100/num_row_groups:1000     58173311 ns     58151000 ns           13 file_size=36.2887M items_per_second=17.1966/s
ReadMetadata/num_columns:1000/num_row_groups:1        1285306 ns      1284195 ns          514 file_size=376.655k items_per_second=778.698/s
ReadMetadata/num_columns:1000/num_row_groups:100     59154014 ns     59110667 ns           12 file_size=36.4815M items_per_second=16.9174/s
ReadMetadata/num_columns:10000/num_row_groups:1      15298734 ns     15288065 ns           46 file_size=3.82213M items_per_second=65.4105/s
ReadMetadata/num_columns:10000/num_row_groups:100   597222875 ns    594531000 ns            1 file_size=369.089M items_per_second=1.682/s

Before:

WriteMetadata/num_columns:1/num_row_groups:1            13997 ns        10952 ns        64411 file_size=459 items_per_second=91.3074k/s
WriteMetadata/num_columns:1/num_row_groups:100        1161928 ns       781421 ns          915 file_size=37.383k items_per_second=1.27972k/s
WriteMetadata/num_columns:1/num_row_groups:1000       9028193 ns      7580868 ns           91 file_size=374.885k items_per_second=131.911/s
WriteMetadata/num_columns:10/num_row_groups:1           87804 ns        81408 ns         8680 file_size=3.762k items_per_second=12.2838k/s
WriteMetadata/num_columns:10/num_row_groups:100       7922727 ns      7032396 ns           96 file_size=358.835k items_per_second=142.199/s
WriteMetadata/num_columns:10/num_row_groups:1000     83557727 ns     72335889 ns            9 file_size=3.614M items_per_second=13.8244/s
WriteMetadata/num_columns:100/num_row_groups:1        1046771 ns       866386 ns          813 file_size=37.352k items_per_second=1.15422k/s
WriteMetadata/num_columns:100/num_row_groups:100     97720995 ns     74290111 ns            9 file_size=3.61693M items_per_second=13.4607/s
WriteMetadata/num_columns:100/num_row_groups:1000  1042585917 ns    773579000 ns            1 file_size=36.2887M items_per_second=1.29269/s
WriteMetadata/num_columns:1000/num_row_groups:1       9320268 ns      8396910 ns           78 file_size=376.655k items_per_second=119.091/s
WriteMetadata/num_columns:1000/num_row_groups:100   789198500 ns    726929000 ns            1 file_size=36.4815M items_per_second=1.37565/s
WriteMetadata/num_columns:10000/num_row_groups:1    105553526 ns     89228125 ns            8 file_size=3.82213M items_per_second=11.2072/s
WriteMetadata/num_columns:10000/num_row_groups:100 9705208125 ns   7941607000 ns            1 file_size=369.089M items_per_second=0.125919/s
ReadMetadata/num_columns:1/num_row_groups:1              3341 ns         3262 ns       215501 file_size=459 items_per_second=306.531k/s
ReadMetadata/num_columns:1/num_row_groups:100           70801 ns        67469 ns        10226 file_size=37.383k items_per_second=14.8215k/s
ReadMetadata/num_columns:1/num_row_groups:1000         697046 ns       661042 ns         1033 file_size=374.885k items_per_second=1.51276k/s
ReadMetadata/num_columns:10/num_row_groups:1            19616 ns        15182 ns        46741 file_size=3.762k items_per_second=65.866k/s
ReadMetadata/num_columns:10/num_row_groups:100         631976 ns       538377 ns         1240 file_size=358.835k items_per_second=1.85743k/s
ReadMetadata/num_columns:10/num_row_groups:1000       5701558 ns      5375484 ns          122 file_size=3.614M items_per_second=186.03/s
ReadMetadata/num_columns:100/num_row_groups:1          137789 ns       128750 ns         5466 file_size=37.352k items_per_second=7.76702k/s
ReadMetadata/num_columns:100/num_row_groups:100       6475114 ns      6090483 ns          118 file_size=3.61693M items_per_second=164.191/s
ReadMetadata/num_columns:100/num_row_groups:1000     64411345 ns     62630000 ns           11 file_size=36.2887M items_per_second=15.9668/s
ReadMetadata/num_columns:1000/num_row_groups:1        1473490 ns      1402757 ns          453 file_size=376.655k items_per_second=712.882/s
ReadMetadata/num_columns:1000/num_row_groups:100     66037220 ns     64025909 ns           11 file_size=36.4815M items_per_second=15.6187/s
ReadMetadata/num_columns:10000/num_row_groups:1      18425749 ns     16564045 ns           44 file_size=3.82213M items_per_second=60.3717/s
ReadMetadata/num_columns:10000/num_row_groups:100   650862958 ns    636789000 ns            1 file_size=369.089M items_per_second=1.57038/s
mapleFU commented 1 week ago

On my AMD 3800X:

Before:

WriteMetadata/num_columns:1/num_row_groups:1            14869 ns        14869 ns        42700 file_size=459 items_per_second=67.2552k/s
WriteMetadata/num_columns:1/num_row_groups:100        1026862 ns      1026848 ns          689 file_size=37.383k items_per_second=973.854/s
WriteMetadata/num_columns:1/num_row_groups:1000       9657576 ns      9656124 ns           72 file_size=374.885k items_per_second=103.561/s
WriteMetadata/num_columns:10/num_row_groups:1          121405 ns       121406 ns         5869 file_size=3.762k items_per_second=8.23686k/s
WriteMetadata/num_columns:10/num_row_groups:100       9488113 ns      9488130 ns           73 file_size=358.835k items_per_second=105.395/s
WriteMetadata/num_columns:10/num_row_groups:1000     98853564 ns     98852700 ns            7 file_size=3.614M items_per_second=10.1161/s
WriteMetadata/num_columns:100/num_row_groups:1        1142870 ns      1142808 ns          629 file_size=37.352k items_per_second=875.037/s
WriteMetadata/num_columns:100/num_row_groups:100     96569070 ns     96568757 ns            7 file_size=3.61693M items_per_second=10.3553/s
WriteMetadata/num_columns:100/num_row_groups:1000  1017437093 ns   1017435400 ns            1 file_size=36.2887M items_per_second=0.982863/s
WriteMetadata/num_columns:1000/num_row_groups:1      11040304 ns     11040197 ns           65 file_size=376.655k items_per_second=90.5781/s
WriteMetadata/num_columns:1000/num_row_groups:100   995932342 ns    995929600 ns            1 file_size=36.4815M items_per_second=1.00409/s
WriteMetadata/num_columns:10000/num_row_groups:1    114961261 ns    114961450 ns            6 file_size=3.82213M items_per_second=8.69857/s
WriteMetadata/num_columns:10000/num_row_groups:100 1.6961e+10 ns   1.6960e+10 ns            1 file_size=369.089M items_per_second=0.0589634/s
ReadMetadata/num_columns:1/num_row_groups:1              6150 ns         6150 ns        95609 file_size=459 items_per_second=162.615k/s
ReadMetadata/num_columns:1/num_row_groups:100          148555 ns       148554 ns         5156 file_size=37.383k items_per_second=6.73154k/s
ReadMetadata/num_columns:1/num_row_groups:1000        1383664 ns      1383603 ns          549 file_size=374.885k items_per_second=722.751/s
ReadMetadata/num_columns:10/num_row_groups:1            31549 ns        31548 ns        16761 file_size=3.762k items_per_second=31.6973k/s
ReadMetadata/num_columns:10/num_row_groups:100        1329978 ns      1329950 ns          486 file_size=358.835k items_per_second=751.908/s
ReadMetadata/num_columns:10/num_row_groups:1000      15798009 ns     15797961 ns           44 file_size=3.614M items_per_second=63.2993/s
ReadMetadata/num_columns:100/num_row_groups:1          297319 ns       297316 ns         2119 file_size=37.352k items_per_second=3.36343k/s
ReadMetadata/num_columns:100/num_row_groups:100      13742747 ns     13742598 ns           49 file_size=3.61693M items_per_second=72.7664/s
ReadMetadata/num_columns:100/num_row_groups:1000    130178737 ns    130176500 ns            5 file_size=36.2887M items_per_second=7.68188/s
ReadMetadata/num_columns:1000/num_row_groups:1        2862534 ns      2862405 ns          260 file_size=376.655k items_per_second=349.357/s
ReadMetadata/num_columns:1000/num_row_groups:100     79884243 ns     79869014 ns            7 file_size=36.4815M items_per_second=12.5205/s
ReadMetadata/num_columns:10000/num_row_groups:1      18818536 ns     18818281 ns           37 file_size=3.82213M items_per_second=53.1398/s
ReadMetadata/num_columns:10000/num_row_groups:100   788936700 ns    788847500 ns            1 file_size=369.089M items_per_second=1.26767/s

After:

WriteMetadata/num_columns:1/num_row_groups:1            14042 ns        14026 ns        48265 file_size=459 items_per_second=71.2951k/s
WriteMetadata/num_columns:1/num_row_groups:100         982543 ns       982545 ns          693 file_size=37.383k items_per_second=1.01776k/s
WriteMetadata/num_columns:1/num_row_groups:1000       9236559 ns      9234951 ns           75 file_size=374.885k items_per_second=108.284/s
WriteMetadata/num_columns:10/num_row_groups:1          115867 ns       115865 ns         6050 file_size=3.762k items_per_second=8.63075k/s
WriteMetadata/num_columns:10/num_row_groups:100       9106303 ns      9106322 ns           77 file_size=358.835k items_per_second=109.814/s
WriteMetadata/num_columns:10/num_row_groups:1000     95039480 ns     95039886 ns            7 file_size=3.614M items_per_second=10.5219/s
WriteMetadata/num_columns:100/num_row_groups:1        1066471 ns      1066474 ns          648 file_size=37.352k items_per_second=937.67/s
WriteMetadata/num_columns:100/num_row_groups:100     92350381 ns     92350900 ns            8 file_size=3.61693M items_per_second=10.8283/s
WriteMetadata/num_columns:100/num_row_groups:1000   972198408 ns    971689600 ns            1 file_size=36.2887M items_per_second=1.02914/s
WriteMetadata/num_columns:1000/num_row_groups:1      10303438 ns     10302799 ns           68 file_size=376.655k items_per_second=97.061/s
WriteMetadata/num_columns:1000/num_row_groups:100   926151272 ns    926026200 ns            1 file_size=36.4815M items_per_second=1.07988/s
WriteMetadata/num_columns:10000/num_row_groups:1    109520337 ns    109283500 ns            6 file_size=3.82213M items_per_second=9.15051/s
WriteMetadata/num_columns:10000/num_row_groups:100 9607536338 ns   9603598900 ns            1 file_size=369.089M items_per_second=0.104128/s
ReadMetadata/num_columns:1/num_row_groups:1              3776 ns         3737 ns       190309 file_size=459 items_per_second=267.588k/s
ReadMetadata/num_columns:1/num_row_groups:100           76296 ns        76114 ns         9217 file_size=37.383k items_per_second=13.1382k/s
ReadMetadata/num_columns:1/num_row_groups:1000         706469 ns       706463 ns          993 file_size=374.885k items_per_second=1.4155k/s
ReadMetadata/num_columns:10/num_row_groups:1            18738 ns        18738 ns        35672 file_size=3.762k items_per_second=53.3679k/s
ReadMetadata/num_columns:10/num_row_groups:100         590179 ns       590180 ns         1202 file_size=358.835k items_per_second=1.6944k/s
ReadMetadata/num_columns:10/num_row_groups:1000       5821858 ns      5821727 ns          123 file_size=3.614M items_per_second=171.77/s
ReadMetadata/num_columns:100/num_row_groups:1          168284 ns       168284 ns         4074 file_size=37.352k items_per_second=5.94234k/s
ReadMetadata/num_columns:100/num_row_groups:100       5752814 ns      5752800 ns          118 file_size=3.61693M items_per_second=173.828/s
ReadMetadata/num_columns:100/num_row_groups:1000     65674677 ns     65672427 ns           11 file_size=36.2887M items_per_second=15.2271/s
ReadMetadata/num_columns:1000/num_row_groups:1        1574680 ns      1574646 ns          444 file_size=376.655k items_per_second=635.063/s
ReadMetadata/num_columns:1000/num_row_groups:100     65989678 ns     65988873 ns           11 file_size=36.4815M items_per_second=15.1541/s
ReadMetadata/num_columns:10000/num_row_groups:1      16967274 ns     16966876 ns           41 file_size=3.82213M items_per_second=58.9384/s
ReadMetadata/num_columns:10000/num_row_groups:100   652885946 ns    652766800 ns            1 file_size=369.089M items_per_second=1.53194/s
pitrou commented 1 week ago

@github-actions crossbow submit -g cpp -g wheel

github-actions[bot] commented 1 week ago

Revision: fde772cb97459f202a1ec3571bc7064a5de72d65

Submitted crossbow builds: ursacomputing/crossbow @ actions-bacf49dea9

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions
conbench-apache-arrow[bot] commented 1 week ago

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9ba9253e8527a7f3e2c6e47e631e278b8ca84e53.

There were 5 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.