Open mqliang opened 3 years ago
I write a benchmark here: https://github.com/mqliang/pinot/commit/7892423579b20dafcb5802a09f20f826377f6c39
The benchmark compares three serialization methods (serialize a typical metadata map):
temporaryOutputStream
: For each KV pair in metadata, first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main streampreAllocateByteArrayNative
:
preAllocateByteArrayWithBytesCache
: same logic as preAllocateByteArrayNative
, just add a cache to cache the encoded K/V so can be used in the second loop.Here is the result:
# JMH version: 1.26
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 30 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative
# Run progress: 0.00% complete, ETA 00:08:00
# Fork: 1 of 1
# Warmup Iteration 1: 552.178 us/op
Iteration 1: 519.531 us/op
·gc.alloc.rate: 3270.480 MB/sec
·gc.alloc.rate.norm: 1811608.009 B/op
·gc.churn.PS_Eden_Space: 3275.114 MB/sec
·gc.churn.PS_Eden_Space.norm: 1814175.318 B/op
·gc.churn.PS_Survivor_Space: 0.558 MB/sec
·gc.churn.PS_Survivor_Space.norm: 309.168 B/op
·gc.count: 525.000 counts
·gc.time: 261.000 ms
Iteration 2: 524.659 us/op
·gc.alloc.rate: 3238.871 MB/sec
·gc.alloc.rate.norm: 1811608.011 B/op
·gc.churn.PS_Eden_Space: 3242.901 MB/sec
·gc.churn.PS_Eden_Space.norm: 1813862.347 B/op
·gc.churn.PS_Survivor_Space: 0.563 MB/sec
·gc.churn.PS_Survivor_Space.norm: 314.968 B/op
·gc.count: 516.000 counts
·gc.time: 263.000 ms
Iteration 3: 526.323 us/op
·gc.alloc.rate: 3228.230 MB/sec
·gc.alloc.rate.norm: 1811608.008 B/op
·gc.churn.PS_Eden_Space: 3232.024 MB/sec
·gc.churn.PS_Eden_Space.norm: 1813736.682 B/op
·gc.churn.PS_Survivor_Space: 0.471 MB/sec
·gc.churn.PS_Survivor_Space.norm: 264.539 B/op
·gc.count: 470.000 counts
·gc.time: 254.000 ms
Iteration 4: 521.779 us/op
·gc.alloc.rate: 3256.320 MB/sec
·gc.alloc.rate.norm: 1811608.008 B/op
·gc.churn.PS_Eden_Space: 3261.433 MB/sec
·gc.churn.PS_Eden_Space.norm: 1814452.617 B/op
·gc.churn.PS_Survivor_Space: 0.560 MB/sec
·gc.churn.PS_Survivor_Space.norm: 311.772 B/op
·gc.count: 534.000 counts
·gc.time: 270.000 ms
Iteration 5: 524.474 us/op
·gc.alloc.rate: 3239.855 MB/sec
·gc.alloc.rate.norm: 1811608.008 B/op
·gc.churn.PS_Eden_Space: 3242.045 MB/sec
·gc.churn.PS_Eden_Space.norm: 1812832.659 B/op
·gc.churn.PS_Survivor_Space: 0.547 MB/sec
·gc.churn.PS_Survivor_Space.norm: 305.975 B/op
·gc.count: 483.000 counts
·gc.time: 255.000 ms
Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative":
523.353 ±(99.9%) 10.345 us/op [Average]
(min, avg, max) = (519.531, 523.353, 526.323), stdev = 2.687
CI (99.9%): [513.008, 533.698] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate":
3246.751 ±(99.9%) 64.066 MB/sec [Average]
(min, avg, max) = (3228.230, 3246.751, 3270.480), stdev = 16.638
CI (99.9%): [3182.685, 3310.818] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm":
1811608.009 ±(99.9%) 0.005 B/op [Average]
(min, avg, max) = (1811608.008, 1811608.009, 1811608.011), stdev = 0.001
CI (99.9%): [1811608.003, 1811608.014] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space":
3250.704 ±(99.9%) 66.578 MB/sec [Average]
(min, avg, max) = (3232.024, 3250.704, 3275.114), stdev = 17.290
CI (99.9%): [3184.126, 3317.282] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm":
1813811.924 ±(99.9%) 2365.646 B/op [Average]
(min, avg, max) = (1812832.659, 1813811.924, 1814452.617), stdev = 614.351
CI (99.9%): [1811446.279, 1816177.570] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space":
0.540 ±(99.9%) 0.150 MB/sec [Average]
(min, avg, max) = (0.471, 0.540, 0.563), stdev = 0.039
CI (99.9%): [0.390, 0.690] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm":
301.285 ±(99.9%) 80.118 B/op [Average]
(min, avg, max) = (264.539, 301.285, 314.968), stdev = 20.806
CI (99.9%): [221.166, 381.403] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count":
2528.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (470.000, 505.600, 534.000), stdev = 27.700
CI (99.9%): [2528.000, 2528.000] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time":
1303.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (254.000, 260.600, 270.000), stdev = 6.504
CI (99.9%): [1303.000, 1303.000] (assumes normal distribution)
# JMH version: 1.26
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 30 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache
# Run progress: 33.33% complete, ETA 00:05:28
# Fork: 1 of 1
# Warmup Iteration 1: 390.616 us/op
Iteration 1: 375.676 us/op
·gc.alloc.rate: 3524.091 MB/sec
·gc.alloc.rate.norm: 1411608.008 B/op
·gc.churn.PS_Eden_Space: 3532.601 MB/sec
·gc.churn.PS_Eden_Space.norm: 1415016.587 B/op
·gc.churn.PS_Survivor_Space: 0.538 MB/sec
·gc.churn.PS_Survivor_Space.norm: 215.400 B/op
·gc.count: 458.000 counts
·gc.time: 248.000 ms
Iteration 2: 375.171 us/op
·gc.alloc.rate: 3528.907 MB/sec
·gc.alloc.rate.norm: 1411608.006 B/op
·gc.churn.PS_Eden_Space: 3534.356 MB/sec
·gc.churn.PS_Eden_Space.norm: 1413787.624 B/op
·gc.churn.PS_Survivor_Space: 0.494 MB/sec
·gc.churn.PS_Survivor_Space.norm: 197.609 B/op
·gc.count: 435.000 counts
·gc.time: 247.000 ms
Iteration 3: 373.233 us/op
·gc.alloc.rate: 3547.720 MB/sec
·gc.alloc.rate.norm: 1411608.005 B/op
·gc.churn.PS_Eden_Space: 3559.728 MB/sec
·gc.churn.PS_Eden_Space.norm: 1416385.929 B/op
·gc.churn.PS_Survivor_Space: 0.539 MB/sec
·gc.churn.PS_Survivor_Space.norm: 214.343 B/op
·gc.count: 462.000 counts
·gc.time: 247.000 ms
Iteration 4: 371.186 us/op
·gc.alloc.rate: 3567.068 MB/sec
·gc.alloc.rate.norm: 1411608.006 B/op
·gc.churn.PS_Eden_Space: 3566.702 MB/sec
·gc.churn.PS_Eden_Space.norm: 1411463.405 B/op
·gc.churn.PS_Survivor_Space: 0.597 MB/sec
·gc.churn.PS_Survivor_Space.norm: 236.411 B/op
·gc.count: 520.000 counts
·gc.time: 271.000 ms
Iteration 5: 370.738 us/op
·gc.alloc.rate: 3571.354 MB/sec
·gc.alloc.rate.norm: 1411608.005 B/op
·gc.churn.PS_Eden_Space: 3582.874 MB/sec
·gc.churn.PS_Eden_Space.norm: 1416161.234 B/op
·gc.churn.PS_Survivor_Space: 0.588 MB/sec
·gc.churn.PS_Survivor_Space.norm: 232.322 B/op
·gc.count: 509.000 counts
·gc.time: 262.000 ms
Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache":
373.201 ±(99.9%) 8.639 us/op [Average]
(min, avg, max) = (370.738, 373.201, 375.676), stdev = 2.243
CI (99.9%): [364.562, 381.840] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate":
3547.828 ±(99.9%) 82.702 MB/sec [Average]
(min, avg, max) = (3524.091, 3547.828, 3571.354), stdev = 21.477
CI (99.9%): [3465.126, 3630.530] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm":
1411608.006 ±(99.9%) 0.005 B/op [Average]
(min, avg, max) = (1411608.005, 1411608.006, 1411608.008), stdev = 0.001
CI (99.9%): [1411608.001, 1411608.011] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space":
3555.252 ±(99.9%) 83.120 MB/sec [Average]
(min, avg, max) = (3532.601, 3555.252, 3582.874), stdev = 21.586
CI (99.9%): [3472.132, 3638.373] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm":
1414562.956 ±(99.9%) 7771.212 B/op [Average]
(min, avg, max) = (1411463.405, 1414562.956, 1416385.929), stdev = 2018.159
CI (99.9%): [1406791.744, 1422334.168] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space":
0.551 ±(99.9%) 0.162 MB/sec [Average]
(min, avg, max) = (0.494, 0.551, 0.597), stdev = 0.042
CI (99.9%): [0.389, 0.713] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm":
219.217 ±(99.9%) 60.046 B/op [Average]
(min, avg, max) = (197.609, 219.217, 236.411), stdev = 15.594
CI (99.9%): [159.171, 279.263] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count":
2384.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (435.000, 476.800, 520.000), stdev = 36.134
CI (99.9%): [2384.000, 2384.000] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time":
1275.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (247.000, 255.000, 271.000), stdev = 10.977
CI (99.9%): [1275.000, 1275.000] (assumes normal distribution)
# JMH version: 1.26
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 30 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream
# Run progress: 66.67% complete, ETA 00:02:44
# Fork: 1 of 1
# Warmup Iteration 1: 483.366 us/op
Iteration 1: 408.351 us/op
·gc.alloc.rate: 3580.758 MB/sec
·gc.alloc.rate.norm: 1558808.007 B/op
·gc.churn.PS_Eden_Space: 3586.078 MB/sec
·gc.churn.PS_Eden_Space.norm: 1561123.846 B/op
·gc.churn.PS_Survivor_Space: 0.511 MB/sec
·gc.churn.PS_Survivor_Space.norm: 222.603 B/op
·gc.count: 476.000 counts
·gc.time: 253.000 ms
Iteration 2: 410.342 us/op
·gc.alloc.rate: 3563.256 MB/sec
·gc.alloc.rate.norm: 1558808.009 B/op
·gc.churn.PS_Eden_Space: 3569.765 MB/sec
·gc.churn.PS_Eden_Space.norm: 1561655.686 B/op
·gc.churn.PS_Survivor_Space: 0.451 MB/sec
·gc.churn.PS_Survivor_Space.norm: 197.394 B/op
·gc.count: 409.000 counts
·gc.time: 244.000 ms
Iteration 3: 407.314 us/op
·gc.alloc.rate: 3589.291 MB/sec
·gc.alloc.rate.norm: 1558808.006 B/op
·gc.churn.PS_Eden_Space: 3592.335 MB/sec
·gc.churn.PS_Eden_Space.norm: 1560130.076 B/op
·gc.churn.PS_Survivor_Space: 0.557 MB/sec
·gc.churn.PS_Survivor_Space.norm: 241.833 B/op
·gc.count: 495.000 counts
·gc.time: 261.000 ms
Iteration 4: 407.294 us/op
·gc.alloc.rate: 3590.035 MB/sec
·gc.alloc.rate.norm: 1558808.006 B/op
·gc.churn.PS_Eden_Space: 3595.643 MB/sec
·gc.churn.PS_Eden_Space.norm: 1561243.143 B/op
·gc.churn.PS_Survivor_Space: 0.439 MB/sec
·gc.churn.PS_Survivor_Space.norm: 190.513 B/op
·gc.count: 382.000 counts
·gc.time: 239.000 ms
Iteration 5: 410.068 us/op
·gc.alloc.rate: 3565.783 MB/sec
·gc.alloc.rate.norm: 1558808.006 B/op
·gc.churn.PS_Eden_Space: 3576.571 MB/sec
·gc.churn.PS_Eden_Space.norm: 1563524.046 B/op
·gc.churn.PS_Survivor_Space: 0.542 MB/sec
·gc.churn.PS_Survivor_Space.norm: 236.741 B/op
·gc.count: 460.000 counts
·gc.time: 252.000 ms
Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream":
408.674 ±(99.9%) 5.641 us/op [Average]
(min, avg, max) = (407.294, 408.674, 410.342), stdev = 1.465
CI (99.9%): [403.033, 414.314] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate":
3577.824 ±(99.9%) 48.952 MB/sec [Average]
(min, avg, max) = (3563.256, 3577.824, 3590.035), stdev = 12.713
CI (99.9%): [3528.873, 3626.776] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm":
1558808.007 ±(99.9%) 0.005 B/op [Average]
(min, avg, max) = (1558808.006, 1558808.007, 1558808.009), stdev = 0.001
CI (99.9%): [1558808.002, 1558808.011] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space":
3584.078 ±(99.9%) 41.614 MB/sec [Average]
(min, avg, max) = (3569.765, 3584.078, 3595.643), stdev = 10.807
CI (99.9%): [3542.465, 3625.692] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm":
1561535.360 ±(99.9%) 4793.590 B/op [Average]
(min, avg, max) = (1560130.076, 1561535.360, 1563524.046), stdev = 1244.880
CI (99.9%): [1556741.769, 1566328.950] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space":
0.500 ±(99.9%) 0.204 MB/sec [Average]
(min, avg, max) = (0.439, 0.500, 0.557), stdev = 0.053
CI (99.9%): [0.296, 0.704] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm":
217.817 ±(99.9%) 88.656 B/op [Average]
(min, avg, max) = (190.513, 217.817, 241.833), stdev = 23.024
CI (99.9%): [129.161, 306.473] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count":
2222.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (382.000, 444.400, 495.000), stdev = 47.300
CI (99.9%): [2222.000, 2222.000] (assumes normal distribution)
Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time":
1249.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (239.000, 249.800, 261.000), stdev = 8.526
CI (99.9%): [1249.000, 1249.000] (assumes normal distribution)
# Run complete. Total time: 00:08:12
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
BenchmarkDataTableSerialization.preAllocateByteArrayNative avgt 5 523.353 ± 10.345 us/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate avgt 5 3246.751 ± 64.066 MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm avgt 5 1811608.009 ± 0.005 B/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space avgt 5 3250.704 ± 66.578 MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm avgt 5 1813811.924 ± 2365.646 B/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space avgt 5 0.540 ± 0.150 MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm avgt 5 301.285 ± 80.118 B/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count avgt 5 2528.000 counts
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time avgt 5 1303.000 ms
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache avgt 5 373.201 ± 8.639 us/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate avgt 5 3547.828 ± 82.702 MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm avgt 5 1411608.006 ± 0.005 B/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space avgt 5 3555.252 ± 83.120 MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm avgt 5 1414562.956 ± 7771.212 B/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space avgt 5 0.551 ± 0.162 MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm avgt 5 219.217 ± 60.046 B/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count avgt 5 2384.000 counts
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time avgt 5 1275.000 ms
BenchmarkDataTableSerialization.temporaryOutputStream avgt 5 408.674 ± 5.641 us/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate avgt 5 3577.824 ± 48.952 MB/sec
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm avgt 5 1558808.007 ± 0.005 B/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space avgt 5 3584.078 ± 41.614 MB/sec
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm avgt 5 1561535.360 ± 4793.590 B/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space avgt 5 0.500 ± 0.204 MB/sec
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm avgt 5 217.817 ± 88.656 B/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count avgt 5 2222.000 counts
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time avgt 5 1249.000 ms
Process finished with exit code 0
If my implementation is correct, benchmark result shows that using pre-allocate byte array with cache is slightly better than temporary output stream (10% faster -- 373.201 us/op VS. 408.674 us/op, use more memory of course to cache encoded KV, but GC time does not increased -- 1275ms VS 1249ms). It's easy to understand why preAllocateByteArrayNative
is the worst one -- it encode K/V twice, whereas other two methods only encode K/V once.
Not sure whether we should do the change just for getting 10% improvement.
As @siddharthteotia pointed out in https://github.com/apache/incubator-pinot/pull/6710#discussion_r599240463_
We need to benchmark this two serialization approach. If the proposed approach is better, will send a PR to address it.