bytedance / sonic

A blazingly fast JSON serializing & deserializing library
Apache License 2.0
6.71k stars 333 forks source link

opt: try predict container size using SIMD #565

Closed AsterDY closed 7 months ago

AsterDY commented 8 months ago

Background

below is benchmark result: (1~100 elements in a array)

goos: darwin
goarch: amd64
pkg: github.com/bytedance/sonic/internal/native/avx2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkCountElems/1-16                    16.70 ns/op            0 B/op          0 allocs/op
BenchmarkCountElems/10-16                  122.5 ns/op             0 B/op          0 allocs/op
BenchmarkCountElems/100-16               1162 ns/op               0 B/op          0 allocs/op
BenchmarkCountElems_fast/1-16                19.29 ns/op            0 B/op          0 allocs/op
BenchmarkCountElems_fast/10-16             27.65 ns/op            0 B/op          0 allocs/op
BenchmarkCountElems_fast/100-16            71.72 ns/op            0 B/op          0 allocs/op

It seems only the count_elems_fast method is worthy for trading off container growth overhead

Experiment

use count_elems_fast() native func to scan and predict element size.

name old allocs/op new allocs/op delta PredictContSize/map/N=0-16 2.00 ± 0% 2.00 ± 0% ~ (all equal) PredictContSize/map/N=1-16 3.00 ± 0% 3.00 ± 0% ~ (all equal) PredictContSize/map/N=10-16 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.000 n=10+10) PredictContSize/map/N=100-16 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) PredictContSize/map/N=1000-16 42.5 ± 1% 4.0 ± 0% -90.59% (p=0.000 n=10+10) PredictContSize/slice/N=0-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) PredictContSize/slice/N=1-16 2.00 ± 0% 2.00 ± 0% ~ (all equal) PredictContSize/slice/N=10-16 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.000 n=10+10) PredictContSize/slice/N=100-16 6.00 ± 0% 2.00 ± 0% -66.67% (p=0.000 n=10+10) PredictContSize/slice/N=1000-16 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10)

- generic decoder:

name old time/op new time/op delta Generic_DecodeGeneric-16 81.3µs ± 0% 67.7µs ± 8% -16.81% (p=0.000 n=6+9) Generic_Parallel_DecodeGeneric-16 25.6µs ±19% 29.1µs ±26% ~ (p=0.063 n=10+10)

name old speed new speed delta Generic_DecodeGeneric-16 160MB/s ± 0% 193MB/s ± 7% +20.33% (p=0.000 n=6+9) Generic_Parallel_DecodeGeneric-16 515MB/s ±21% 459MB/s ±32% ~ (p=0.063 n=10+10)

name old alloc/op new alloc/op delta Generic_DecodeGeneric-16 48.9kB ± 0% 67.8kB ± 0% +38.49% (p=0.000 n=8+10) Generic_Parallel_DecodeGeneric-16 49.0kB ± 0% 67.9kB ± 0% +38.50% (p=0.000 n=10+10)

name old allocs/op new allocs/op delta Generic_DecodeGeneric-16 313 ± 0% 291 ± 0% -7.03% (p=0.000 n=10+10)


## Conclusion
CPU performance improves +20~40% for container size > 10,  in trading off Memory performance downgrades -40% (Malloc)
## How to use
- Set [option.PredictContainerSize = true](https://github.com/bytedance/sonic/pull/565/files#diff-1ac97a144b121797df02eb5cc83716df12d40eec7cf22a7f67bc2146f7b69b43R48), which only works for concrete type (map/slice) unmarshaling
- Set env [SONIC_ENABLE_PCS=1](https://github.com/bytedance/sonic/pull/565/files#diff-a8e8e160be6354d0ef0a2d57cdb1c4b637c02db43fc16d06497b9b4bf95b668aR25), which works globally for both concrete type and interface type (RECOMMANDED)
codecov-commenter commented 8 months ago

Codecov Report

Attention: 150 lines in your changes are missing coverage. Please review.

Comparison is base (8c71eb0) 78.57% compared to head (80578e5) 77.62%.

Files Patch % Lines
internal/decoder/debug.go 0.00% 108 Missing :warning:
internal/decoder/generic_regabi_amd64.go 25.00% 38 Missing and 4 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #565 +/- ## ========================================== - Coverage 78.57% 77.62% -0.95% ========================================== Files 69 69 Lines 10714 10912 +198 ========================================== + Hits 8418 8470 +52 - Misses 1930 2072 +142 - Partials 366 370 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.