Closed mapleFU closed 1 week ago
:warning: GitHub issue #41702 has been automatically assigned in GitHub to PR creator.
@emkornfield @pitrou I've update a patching here. This generated call less virtual functions during deserializing. Would you mind take a look?
I'm not so familiar with thrift compiler, maybe more useful tools can help deserializing
@mapleFU I didn't know this was possible. This looks neat in the principle. Did you try to run some benchmark?
Run in page index: https://github.com/apache/arrow/issues/41702#issuecomment-2116873657
For footer it's more useful since readVirt is called for more times
I remember there was about 3% speedup reading a sample parquet file.
Perhaps you can try with the additional benchmarks in https://github.com/apache/arrow/pull/41761
On my M1 Pro with Release(O2):
After:
Run on (10 X 24.0711 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x10)
Load Average: 7.98, 10.79, 8.83
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
WriteMetadata/num_columns:1/num_row_groups:1 10248 ns 10198 ns 65596 file_size=459 items_per_second=98.0618k/s
WriteMetadata/num_columns:1/num_row_groups:100 708873 ns 701642 ns 1003 file_size=37.383k items_per_second=1.42523k/s
WriteMetadata/num_columns:1/num_row_groups:1000 7027939 ns 7022677 ns 99 file_size=374.885k items_per_second=142.396/s
WriteMetadata/num_columns:10/num_row_groups:1 78750 ns 78709 ns 8900 file_size=3.762k items_per_second=12.705k/s
WriteMetadata/num_columns:10/num_row_groups:100 6751510 ns 6644838 ns 105 file_size=358.835k items_per_second=150.493/s
WriteMetadata/num_columns:10/num_row_groups:1000 67659713 ns 67142800 ns 10 file_size=3.614M items_per_second=14.8936/s
WriteMetadata/num_columns:100/num_row_groups:1 787280 ns 771871 ns 910 file_size=37.352k items_per_second=1.29555k/s
WriteMetadata/num_columns:100/num_row_groups:100 66632500 ns 66540000 ns 10 file_size=3.61693M items_per_second=15.0286/s
WriteMetadata/num_columns:100/num_row_groups:1000 703455917 ns 699385000 ns 1 file_size=36.2887M items_per_second=1.42983/s
WriteMetadata/num_columns:1000/num_row_groups:1 8089713 ns 8087153 ns 85 file_size=376.655k items_per_second=123.653/s
WriteMetadata/num_columns:1000/num_row_groups:100 705972459 ns 702311000 ns 1 file_size=36.4815M items_per_second=1.42387/s
WriteMetadata/num_columns:10000/num_row_groups:1 82793505 ns 82773750 ns 8 file_size=3.82213M items_per_second=12.0811/s
WriteMetadata/num_columns:10000/num_row_groups:100 7789295000 ns 7492551000 ns 1 file_size=369.089M items_per_second=0.133466/s
ReadMetadata/num_columns:1/num_row_groups:1 3022 ns 3021 ns 229889 file_size=459 items_per_second=330.982k/s
ReadMetadata/num_columns:1/num_row_groups:100 59165 ns 59139 ns 11742 file_size=37.383k items_per_second=16.9092k/s
ReadMetadata/num_columns:1/num_row_groups:1000 587111 ns 586972 ns 1189 file_size=374.885k items_per_second=1.70366k/s
ReadMetadata/num_columns:10/num_row_groups:1 13977 ns 13973 ns 50402 file_size=3.762k items_per_second=71.569k/s
ReadMetadata/num_columns:10/num_row_groups:100 475674 ns 475562 ns 1469 file_size=358.835k items_per_second=2.10278k/s
ReadMetadata/num_columns:10/num_row_groups:1000 4743075 ns 4742237 ns 139 file_size=3.614M items_per_second=210.871/s
ReadMetadata/num_columns:100/num_row_groups:1 119355 ns 119308 ns 5747 file_size=37.352k items_per_second=8.38169k/s
ReadMetadata/num_columns:100/num_row_groups:100 5379931 ns 5378835 ns 133 file_size=3.61693M items_per_second=185.914/s
ReadMetadata/num_columns:100/num_row_groups:1000 58173311 ns 58151000 ns 13 file_size=36.2887M items_per_second=17.1966/s
ReadMetadata/num_columns:1000/num_row_groups:1 1285306 ns 1284195 ns 514 file_size=376.655k items_per_second=778.698/s
ReadMetadata/num_columns:1000/num_row_groups:100 59154014 ns 59110667 ns 12 file_size=36.4815M items_per_second=16.9174/s
ReadMetadata/num_columns:10000/num_row_groups:1 15298734 ns 15288065 ns 46 file_size=3.82213M items_per_second=65.4105/s
ReadMetadata/num_columns:10000/num_row_groups:100 597222875 ns 594531000 ns 1 file_size=369.089M items_per_second=1.682/s
Before:
WriteMetadata/num_columns:1/num_row_groups:1 13997 ns 10952 ns 64411 file_size=459 items_per_second=91.3074k/s
WriteMetadata/num_columns:1/num_row_groups:100 1161928 ns 781421 ns 915 file_size=37.383k items_per_second=1.27972k/s
WriteMetadata/num_columns:1/num_row_groups:1000 9028193 ns 7580868 ns 91 file_size=374.885k items_per_second=131.911/s
WriteMetadata/num_columns:10/num_row_groups:1 87804 ns 81408 ns 8680 file_size=3.762k items_per_second=12.2838k/s
WriteMetadata/num_columns:10/num_row_groups:100 7922727 ns 7032396 ns 96 file_size=358.835k items_per_second=142.199/s
WriteMetadata/num_columns:10/num_row_groups:1000 83557727 ns 72335889 ns 9 file_size=3.614M items_per_second=13.8244/s
WriteMetadata/num_columns:100/num_row_groups:1 1046771 ns 866386 ns 813 file_size=37.352k items_per_second=1.15422k/s
WriteMetadata/num_columns:100/num_row_groups:100 97720995 ns 74290111 ns 9 file_size=3.61693M items_per_second=13.4607/s
WriteMetadata/num_columns:100/num_row_groups:1000 1042585917 ns 773579000 ns 1 file_size=36.2887M items_per_second=1.29269/s
WriteMetadata/num_columns:1000/num_row_groups:1 9320268 ns 8396910 ns 78 file_size=376.655k items_per_second=119.091/s
WriteMetadata/num_columns:1000/num_row_groups:100 789198500 ns 726929000 ns 1 file_size=36.4815M items_per_second=1.37565/s
WriteMetadata/num_columns:10000/num_row_groups:1 105553526 ns 89228125 ns 8 file_size=3.82213M items_per_second=11.2072/s
WriteMetadata/num_columns:10000/num_row_groups:100 9705208125 ns 7941607000 ns 1 file_size=369.089M items_per_second=0.125919/s
ReadMetadata/num_columns:1/num_row_groups:1 3341 ns 3262 ns 215501 file_size=459 items_per_second=306.531k/s
ReadMetadata/num_columns:1/num_row_groups:100 70801 ns 67469 ns 10226 file_size=37.383k items_per_second=14.8215k/s
ReadMetadata/num_columns:1/num_row_groups:1000 697046 ns 661042 ns 1033 file_size=374.885k items_per_second=1.51276k/s
ReadMetadata/num_columns:10/num_row_groups:1 19616 ns 15182 ns 46741 file_size=3.762k items_per_second=65.866k/s
ReadMetadata/num_columns:10/num_row_groups:100 631976 ns 538377 ns 1240 file_size=358.835k items_per_second=1.85743k/s
ReadMetadata/num_columns:10/num_row_groups:1000 5701558 ns 5375484 ns 122 file_size=3.614M items_per_second=186.03/s
ReadMetadata/num_columns:100/num_row_groups:1 137789 ns 128750 ns 5466 file_size=37.352k items_per_second=7.76702k/s
ReadMetadata/num_columns:100/num_row_groups:100 6475114 ns 6090483 ns 118 file_size=3.61693M items_per_second=164.191/s
ReadMetadata/num_columns:100/num_row_groups:1000 64411345 ns 62630000 ns 11 file_size=36.2887M items_per_second=15.9668/s
ReadMetadata/num_columns:1000/num_row_groups:1 1473490 ns 1402757 ns 453 file_size=376.655k items_per_second=712.882/s
ReadMetadata/num_columns:1000/num_row_groups:100 66037220 ns 64025909 ns 11 file_size=36.4815M items_per_second=15.6187/s
ReadMetadata/num_columns:10000/num_row_groups:1 18425749 ns 16564045 ns 44 file_size=3.82213M items_per_second=60.3717/s
ReadMetadata/num_columns:10000/num_row_groups:100 650862958 ns 636789000 ns 1 file_size=369.089M items_per_second=1.57038/s
On my AMD 3800X:
Before:
WriteMetadata/num_columns:1/num_row_groups:1 14869 ns 14869 ns 42700 file_size=459 items_per_second=67.2552k/s
WriteMetadata/num_columns:1/num_row_groups:100 1026862 ns 1026848 ns 689 file_size=37.383k items_per_second=973.854/s
WriteMetadata/num_columns:1/num_row_groups:1000 9657576 ns 9656124 ns 72 file_size=374.885k items_per_second=103.561/s
WriteMetadata/num_columns:10/num_row_groups:1 121405 ns 121406 ns 5869 file_size=3.762k items_per_second=8.23686k/s
WriteMetadata/num_columns:10/num_row_groups:100 9488113 ns 9488130 ns 73 file_size=358.835k items_per_second=105.395/s
WriteMetadata/num_columns:10/num_row_groups:1000 98853564 ns 98852700 ns 7 file_size=3.614M items_per_second=10.1161/s
WriteMetadata/num_columns:100/num_row_groups:1 1142870 ns 1142808 ns 629 file_size=37.352k items_per_second=875.037/s
WriteMetadata/num_columns:100/num_row_groups:100 96569070 ns 96568757 ns 7 file_size=3.61693M items_per_second=10.3553/s
WriteMetadata/num_columns:100/num_row_groups:1000 1017437093 ns 1017435400 ns 1 file_size=36.2887M items_per_second=0.982863/s
WriteMetadata/num_columns:1000/num_row_groups:1 11040304 ns 11040197 ns 65 file_size=376.655k items_per_second=90.5781/s
WriteMetadata/num_columns:1000/num_row_groups:100 995932342 ns 995929600 ns 1 file_size=36.4815M items_per_second=1.00409/s
WriteMetadata/num_columns:10000/num_row_groups:1 114961261 ns 114961450 ns 6 file_size=3.82213M items_per_second=8.69857/s
WriteMetadata/num_columns:10000/num_row_groups:100 1.6961e+10 ns 1.6960e+10 ns 1 file_size=369.089M items_per_second=0.0589634/s
ReadMetadata/num_columns:1/num_row_groups:1 6150 ns 6150 ns 95609 file_size=459 items_per_second=162.615k/s
ReadMetadata/num_columns:1/num_row_groups:100 148555 ns 148554 ns 5156 file_size=37.383k items_per_second=6.73154k/s
ReadMetadata/num_columns:1/num_row_groups:1000 1383664 ns 1383603 ns 549 file_size=374.885k items_per_second=722.751/s
ReadMetadata/num_columns:10/num_row_groups:1 31549 ns 31548 ns 16761 file_size=3.762k items_per_second=31.6973k/s
ReadMetadata/num_columns:10/num_row_groups:100 1329978 ns 1329950 ns 486 file_size=358.835k items_per_second=751.908/s
ReadMetadata/num_columns:10/num_row_groups:1000 15798009 ns 15797961 ns 44 file_size=3.614M items_per_second=63.2993/s
ReadMetadata/num_columns:100/num_row_groups:1 297319 ns 297316 ns 2119 file_size=37.352k items_per_second=3.36343k/s
ReadMetadata/num_columns:100/num_row_groups:100 13742747 ns 13742598 ns 49 file_size=3.61693M items_per_second=72.7664/s
ReadMetadata/num_columns:100/num_row_groups:1000 130178737 ns 130176500 ns 5 file_size=36.2887M items_per_second=7.68188/s
ReadMetadata/num_columns:1000/num_row_groups:1 2862534 ns 2862405 ns 260 file_size=376.655k items_per_second=349.357/s
ReadMetadata/num_columns:1000/num_row_groups:100 79884243 ns 79869014 ns 7 file_size=36.4815M items_per_second=12.5205/s
ReadMetadata/num_columns:10000/num_row_groups:1 18818536 ns 18818281 ns 37 file_size=3.82213M items_per_second=53.1398/s
ReadMetadata/num_columns:10000/num_row_groups:100 788936700 ns 788847500 ns 1 file_size=369.089M items_per_second=1.26767/s
After:
WriteMetadata/num_columns:1/num_row_groups:1 14042 ns 14026 ns 48265 file_size=459 items_per_second=71.2951k/s
WriteMetadata/num_columns:1/num_row_groups:100 982543 ns 982545 ns 693 file_size=37.383k items_per_second=1.01776k/s
WriteMetadata/num_columns:1/num_row_groups:1000 9236559 ns 9234951 ns 75 file_size=374.885k items_per_second=108.284/s
WriteMetadata/num_columns:10/num_row_groups:1 115867 ns 115865 ns 6050 file_size=3.762k items_per_second=8.63075k/s
WriteMetadata/num_columns:10/num_row_groups:100 9106303 ns 9106322 ns 77 file_size=358.835k items_per_second=109.814/s
WriteMetadata/num_columns:10/num_row_groups:1000 95039480 ns 95039886 ns 7 file_size=3.614M items_per_second=10.5219/s
WriteMetadata/num_columns:100/num_row_groups:1 1066471 ns 1066474 ns 648 file_size=37.352k items_per_second=937.67/s
WriteMetadata/num_columns:100/num_row_groups:100 92350381 ns 92350900 ns 8 file_size=3.61693M items_per_second=10.8283/s
WriteMetadata/num_columns:100/num_row_groups:1000 972198408 ns 971689600 ns 1 file_size=36.2887M items_per_second=1.02914/s
WriteMetadata/num_columns:1000/num_row_groups:1 10303438 ns 10302799 ns 68 file_size=376.655k items_per_second=97.061/s
WriteMetadata/num_columns:1000/num_row_groups:100 926151272 ns 926026200 ns 1 file_size=36.4815M items_per_second=1.07988/s
WriteMetadata/num_columns:10000/num_row_groups:1 109520337 ns 109283500 ns 6 file_size=3.82213M items_per_second=9.15051/s
WriteMetadata/num_columns:10000/num_row_groups:100 9607536338 ns 9603598900 ns 1 file_size=369.089M items_per_second=0.104128/s
ReadMetadata/num_columns:1/num_row_groups:1 3776 ns 3737 ns 190309 file_size=459 items_per_second=267.588k/s
ReadMetadata/num_columns:1/num_row_groups:100 76296 ns 76114 ns 9217 file_size=37.383k items_per_second=13.1382k/s
ReadMetadata/num_columns:1/num_row_groups:1000 706469 ns 706463 ns 993 file_size=374.885k items_per_second=1.4155k/s
ReadMetadata/num_columns:10/num_row_groups:1 18738 ns 18738 ns 35672 file_size=3.762k items_per_second=53.3679k/s
ReadMetadata/num_columns:10/num_row_groups:100 590179 ns 590180 ns 1202 file_size=358.835k items_per_second=1.6944k/s
ReadMetadata/num_columns:10/num_row_groups:1000 5821858 ns 5821727 ns 123 file_size=3.614M items_per_second=171.77/s
ReadMetadata/num_columns:100/num_row_groups:1 168284 ns 168284 ns 4074 file_size=37.352k items_per_second=5.94234k/s
ReadMetadata/num_columns:100/num_row_groups:100 5752814 ns 5752800 ns 118 file_size=3.61693M items_per_second=173.828/s
ReadMetadata/num_columns:100/num_row_groups:1000 65674677 ns 65672427 ns 11 file_size=36.2887M items_per_second=15.2271/s
ReadMetadata/num_columns:1000/num_row_groups:1 1574680 ns 1574646 ns 444 file_size=376.655k items_per_second=635.063/s
ReadMetadata/num_columns:1000/num_row_groups:100 65989678 ns 65988873 ns 11 file_size=36.4815M items_per_second=15.1541/s
ReadMetadata/num_columns:10000/num_row_groups:1 16967274 ns 16966876 ns 41 file_size=3.82213M items_per_second=58.9384/s
ReadMetadata/num_columns:10000/num_row_groups:100 652885946 ns 652766800 ns 1 file_size=369.089M items_per_second=1.53194/s
@github-actions crossbow submit -g cpp -g wheel
Revision: fde772cb97459f202a1ec3571bc7064a5de72d65
Submitted crossbow builds: ursacomputing/crossbow @ actions-bacf49dea9
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9ba9253e8527a7f3e2c6e47e631e278b8ca84e53.
There were 5 benchmark results indicating a performance regression:
ursa-i9-9960x
at 2024-05-22 19:20:09Z
The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.
Rationale for this change
By default, the Thrift serializer and deserializer call many virtual functions. However, the Thrift C++ compiler has an option to generate template methods that does away with the cost of calling virtual functions. It seems to make the metadata read/write benchmarks around 10% faster.
What changes are included in this PR?
cpp/build-support/update-thrift.sh
: enabletemplates
option to Thirft C++ compilerargumentcpp/src/parquet/thrift_internal.h
: use generated codecpp/src/generated
: update generated files.Are these changes tested?
Covered by existing tests.
Are there any user-facing changes?
No.