Closed cyb70289 closed 4 weeks ago
@cyb70289 What's unit of your benchmark results? HIB or LIB?
@cyb70289 What's unit of your benchmark results? HIB or LIB?
Gi/s and Mi/s, bytes per second.
As an example
$ build/benchmark/bench --benchmark_filter=Decode_Sonic
gsoc-2018/Decode_SonicDyn 1299148 ns 1299146 ns 537 bytes_per_second=2.38563Gi/s testdata/gsoc-2018.json
citm_catalog/Decode_SonicDyn 1136378 ns 1136290 ns 617 bytes_per_second=1.41565Gi/s testdata/citm_catalog.json
otfcc/Decode_SonicDyn 158508828 ns 158472460 ns 4 bytes_per_second=399.646Mi/s testdata/otfcc.json
fgo/Decode_SonicDyn 67084470 ns 67084360 ns 9 bytes_per_second=692.246Mi/s testdata/fgo.json
......
see #56,support sve as a different arch.
Thanks, will try to refactor following that PR. Instead of adding a complete SVE implementation, I'm thinking about "inherit" from NEON and only override code that can benefit from SVE. Looks to me many code will be the same for NEON and SVE.
@xiegx94 , sve2-128 implementation is added. Arm common code is moved to common/arm_common/. I checked sonic decoder benchmarks, no performance regression is found.
Any convenient way to run clang-format job locally?
Any convenient way to run clang-format job locally?
Could you install clang in your machine? If you have a clang-format, run git clang-format
Any convenient way to run clang-format job locally?
Could you install clang in your machine? If you have a clang-format, run
git clang-format
Thanks, format should be fixed now.
"Test coverage" runs successfully on my local x86 server. Not sure why CI job fails. Looks it's only for x86?
@cyb70289 pls update cmake/set_arch_flags.cmake.
@cyb70289 pls update cmake/set_arch_flags.cmake.
@xiegx94 updated
This patch improves sonic json decoder performance on Arm SVE2 CPU. It leverages SVMATCH instruction to locate multiple tokens in a vector efficiently.
Enable this feature by specifying cmake option "-DENABLE_SVE2_128=ON". Please note the binary can only run on hardware with SVE2 supported, and the vector size must be 128 bits, like Neoverse-N2. Otherwise, the code behaviour is undefined.
As shown in the table below, tested on Bluewhale server, obvious performance uplift is observed from sonic decoder benchmarks. No side effect observed for other benchmarks.