bytedance / sonic

A blazingly fast JSON serializing & deserializing library
Apache License 2.0
6.54k stars 322 forks source link

chore: remove noavx2 mode #647

Closed liuq19 closed 1 month ago

liuq19 commented 1 month ago

There are two reasons to not support noavx2 mode:

  1. the performance between noavx2 and noavx is almost the same (besides Get api).
  2. the machine with avx but without avx2 is unusual

Solution:

  1. remove avx code in native
  2. make sure forward compatibility, when SONIC_MODE=noavx2, it will act as noavx mode

Decode performance compare: avx is same as sse, 20% slower than avx2.

➜  sonic2 git:(main) ✗ SONIC_MODE="noavx"  go test -run=none  -bench="BenchmarkDecoder_Binding_Sonic"  -benchmem   ./decoder 
enabled sse
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/decoder
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkDecoder_Binding_Sonic-8                   29247             40689 ns/op         320.35 MB/s        9644 B/op        127 allocs/op
BenchmarkDecoder_Binding_Sonic_Fast-8              36027             33836 ns/op         385.25 MB/s        6589 B/op         24 allocs/op
PASS
ok      github.com/bytedance/sonic/decoder      3.224s
➜  sonic2 git:(main) ✗ SONIC_MODE="noavx2"  go test -run=none  -bench="BenchmarkDecoder_Binding_Sonic"  -benchmem   ./decoder 
enabled avx
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/decoder
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkDecoder_Binding_Sonic-8                   30001             40402 ns/op         322.63 MB/s        9650 B/op        127 allocs/op
BenchmarkDecoder_Binding_Sonic_Fast-8              34809             34243 ns/op         380.67 MB/s        6611 B/op         24 allocs/op
PASS
ok      github.com/bytedance/sonic/decoder      3.221s
➜  sonic2 git:(main) ✗ go test -run=none  -bench="BenchmarkDecoder_Binding_Sonic"  -benchmem   ./decoder                      
enabled avx2
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/decoder
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkDecoder_Binding_Sonic-8                   34161             34069 ns/op         382.61 MB/s        9633 B/op        127 allocs/op
BenchmarkDecoder_Binding_Sonic_Fast-8              42513             27970 ns/op         466.04 MB/s        6600 B/op         24 allocs/op
PASS
ok      github.com/bytedance/sonic/decoder      3.058s

Encode performance: avx is same as sse, and same as avx2.

➜  sonic2 git:(main) ✗ go test -run=none  -bench=BenchmarkEncoder_Binding_Sonic    -benchmem   ./encoder                      
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/encoder
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkEncoder_Binding_Sonic-8                  114964             10105 ns/op        1289.99 MB/s       14185 B/op          4 allocs/op
BenchmarkEncoder_Binding_Sonic_Fast-8             133300              8679 ns/op        1501.90 MB/s        9906 B/op          4 allocs/op
PASS
ok      github.com/bytedance/sonic/encoder      2.552s
➜  sonic2 git:(main) ✗ SONIC_MODE="noavx2"   go test -run=none  -bench=BenchmarkEncoder_Binding_Sonic    -benchmem   ./encoder
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/encoder
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkEncoder_Binding_Sonic-8                  111176             10876 ns/op        1198.53 MB/s       14204 B/op          4 allocs/op
BenchmarkEncoder_Binding_Sonic_Fast-8             131544              8754 ns/op        1489.10 MB/s        9929 B/op          4 allocs/op
PASS
ok      github.com/bytedance/sonic/encoder      2.600s
➜  sonic2 git:(main) ✗ SONIC_MODE="noavx"   go test -run=none  -bench=BenchmarkEncoder_Binding_Sonic    -benchmem   ./encoder 
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/encoder
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkEncoder_Binding_Sonic-8                  107476             11097 ns/op        1174.62 MB/s       14221 B/op          4 allocs/op
BenchmarkEncoder_Binding_Sonic_Fast-8             135178              8671 ns/op        1503.37 MB/s        9903 B/op          4 allocs/op
PASS
ok      github.com/bytedance/sonic/encoder      2.605s
➜  sonic2 git:(main) ✗ 

Get performance: avx is about 15% faster than sse, avx2 is about 40% faster than avx.

➜  sonic2 git:(main) ✗   go test -run=none  -bench="BenchmarkGetOne_Sonic"  -benchmem   ./ast   
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/ast
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkGetOne_Sonic-8           735288              1598 ns/op        8148.42 MB/s          24 B/op          1 allocs/op
PASS
ok      github.com/bytedance/sonic/ast  1.207s
➜  sonic2 git:(main) ✗ SONIC_MODE="noavx2"  go test -run=none  -bench="BenchmarkGetOne_Sonic"  -benchmem   ./ast 
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/ast
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkGetOne_Sonic-8           557578              2277 ns/op        5720.21 MB/s          24 B/op          1 allocs/op
PASS
ok      github.com/bytedance/sonic/ast  1.306s
➜  sonic2 git:(main) ✗ SONIC_MODE="noavx"  go test -run=none  -bench="BenchmarkGetOne_Sonic"  -benchmem   ./ast
goos: linux
goarch: amd64
pkg: github.com/bytedance/sonic/ast
cpu: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
BenchmarkGetOne_Sonic-8           471970              2614 ns/op        4982.66 MB/s          24 B/op          1 allocs/op
PASS
ok      github.com/bytedance/sonic/ast  1.275s