Closed xiegx94 closed 1 year ago
This PR provides 2 ways to support multi-arch dispatch: dispatch at compile (static dispatch) and dispatch at runtime (dynamic dispatch). Dynamic dispatch is implemented by using gcc/clang multiversioning-functions, which causes these function cannot be inlined when compile and the performance will be worse.
├── avx2
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── common
│ ├── quote_common.h
│ ├── quote_tables.h
│ ├── skip_common.h
│ ├── unicode_common.h
│ └── x86_common
│ ├── itoa.h
│ ├── quote.inc.h
│ └── skip.inc.h
├── neon
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── simd_base.h
├── simd_dispatch.h
├── simd_itoa.h
├── simd_quote.h
├── simd_skip.h
├── simd_str2int.h
├── sonic_cpu_feature.h
├── sse
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── target_macro.h
└── x86_ifuncs
├── base.h
├── ifunc_macro.h
├── itoa.h
├── quote.h
├── skip.h
└── str2int.h
If you want to add a new simd function which is called foo
, then, you should follow below steps:
namespace sonic_json {
namespace internal {
namespace avx2 {
void foo() { return; }
} // namespace avx2 } // namespace internal } // namespace sonic_json
2. provide dynamic dispatch functions for x86 (or others platform)
```c++
namespace sonic_json {
namespace internal {
__attribute__((target(HASWELL))) inline void foo() { return avx2::foo(); }
__attribute__((target(WESTMERE))) inline void foo() { return sse::foo(); }
__attribute__((target("default"))) inline void foo() { return sse::foo(); }
}
}
foo
in a new header file foo.h
, you should provide such file for every arch and x86_ifuncs. then add a new file simd_foo.h
in arch floder:
#pragma once
namespace sonic_json { namespace internal {
SONIC_USING_ARCH_FUNC(foo);
} }
# How to add a new architecture
If there is a new architecture named `Y86`, you should do:
1. write a new rule to detect `Y86` macro ( provide by gcc/clang) in `sonic_cpu_feature.h`
```c++
#if defined(__Y86__)
#define SONIC_HAVE_Y86
#endif
simd_dispatch.h
#if defined(SONIC_STATIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func) using Y86::func
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(Y86/file)
#endif
#elif defined(SONIC_DYNAMIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func)
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(y86_ifuncs/file)
#endif
#endif
sonic 的多架构设计同时支持在编译期间选择指定的指令和在运行时根据运行的平台选择合适的指令。同时支持两种方式是因为在运行时抉择会让使用 simd 的函数/接口无法在编译期间 inline,这会引起一些性能下降。
├── avx2
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── common
│ ├── quote_common.h
│ ├── quote_tables.h
│ ├── skip_common.h
│ ├── unicode_common.h
│ └── x86_common
│ ├── itoa.h
│ ├── quote.inc.h
│ └── skip.inc.h
├── neon
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── simd_base.h
├── simd_dispatch.h
├── simd_itoa.h
├── simd_quote.h
├── simd_skip.h
├── simd_str2int.h
├── sonic_cpu_feature.h
├── sse
│ ├── base.h
│ ├── itoa.h
│ ├── quote.h
│ ├── simd.h
│ ├── skip.h
│ ├── str2int.h
│ └── unicode.h
├── target_macro.h
└── x86_ifuncs
├── base.h
├── ifunc_macro.h
├── itoa.h
├── quote.h
├── skip.h
└── str2int.h
avx2, sse, neon。特定架构下的 simd 实现代码 common, 通用的一些实现 x86_ifuncs x86 平台动态 dispatch 代码
namespace sonic_json {
namespace internal {
namespace avx2 {
void foo() { return; }
} // namespace avx2 } // namespace internal } // namespace sonic_json
5. 在 x86_ifunc 下添加 x86 动态 dispatch 支持:
```c++
namespace sonic_json {
namespace internal {
__attribute__((target(HASWELL))) inline void foo() { return avx2::foo(); }
__attribute__((target(WESTMERE))) inline void foo() { return sse::foo(); }
__attribute__((target("default"))) inline void foo() { return sse::foo(); }
}
}
#pragma once
namespace sonic_json { namespace internal {
SONIC_USING_ARCH_FUNC(foo);
} }
# 如何添加新的架构
假如有个新的架构叫Y86,需要在 sonic 中添加其 simd 支持,则:
1. 在 sonic_cpu_feature.h 中检测Y86的宏:
```c++
#if defined(__Y86__)
#define SONIC_HAVE_Y86
#endif
#if defined(SONIC_STATIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func) using Y86::func
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(Y86/file)
#endif
#elif defined(SONIC_DYNAMIC_DISPATCH)
#if defined(SONIC_HAVE_Y86)
#define SONIC_USING_ARCH_FUNC(func)
#define INCLUDE_ARCH_FILE(file) SONIC_STRINGIFY(y86_ifuncs/file)
#endif
#endif
Merging #56 (9980dc1) into master (80cdba0) will increase coverage by
0.84%
. The diff coverage is91.61%
.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## master #56 +/- ##
==========================================
+ Coverage 95.04% 95.88% +0.84%
==========================================
Files 22 21 -1
Lines 2785 2431 -354
==========================================
- Hits 2647 2331 -316
+ Misses 138 100 -38
Impacted Files | Coverage Δ | |
---|---|---|
include/sonic/allocator.h | 90.43% <ø> (ø) |
|
include/sonic/dom/dynamicnode.h | 96.08% <ø> (ø) |
|
include/sonic/dom/serialize.h | 93.39% <ø> (ø) |
|
include/sonic/internal/arch/avx2/base.h | 100.00% <ø> (ø) |
|
include/sonic/internal/ftoa.h | 97.34% <ø> (ø) |
|
include/sonic/internal/itoa.h | 100.00% <ø> (ø) |
|
include/sonic/internal/arch/simd_skip.h | 89.23% <89.23%> (ø) |
|
include/sonic/dom/handler.h | 99.04% <100.00%> (ø) |
|
include/sonic/dom/parser.h | 94.23% <100.00%> (ø) |
|
include/sonic/internal/arch/avx2/simd.h | 100.00% <100.00%> (ø) |
|
... and 4 more |
... and 3 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Performance | test case | master(haswell) | sse | haswell | dynamic dispatch |
---|---|---|---|---|---|
book/Decode_SonicDyn | 980 ns | 813 ns | 849 ns | 1128 | |
gsoc-2018/Decode_SonicDyn | 1406878 ns | 1470898 ns | 1339296 ns | 1752588 | |
fgo/Decode_SonicDyn | 129952490 ns | 112165070 ns | 117719364 ns | 150338769 | |
lottie/Decode_SonicDyn | 948184 ns | 805143 ns | 842756 ns | 1187414 | |
canada/Decode_SonicDyn | 4068896 ns | 3756878 ns | 3789520 ns | 4085432 | |
github_events/Decode_SonicDyn | 42468 ns | 39368 ns | 39716 ns | 54755 | |
otfcc/Decode_SonicDyn | 321242929 ns | 292141676 ns | 320360184 ns | 377427578 | |
poet/Decode_SonicDyn | 1611831 ns | 1572923 ns | 1534339 ns | 1743444 | |
citm_catalog/Decode_SonicDyn | 1212610 ns | 1137476 ns | 1217241 ns | 1439325 | |
twitter/Decode_SonicDyn | 194191 ns | 181451 ns | 185673 ns | 260165 | |
twitterescaped/Decode_SonicDyn | 572412 ns | 492546 ns | 555098 ns | 671564 | |
book/Encode_SonicDyn | 598 ns | 619 ns | 631 ns | 616 | |
gsoc-2018/Encode_SonicDyn | 702591 ns | 796307 ns | 680203 ns | 672574 | |
fgo/Encode_SonicDyn | 75789720 ns | 75432301 ns | 76930374 ns | 75568517 | |
lottie/Encode_SonicDyn | 846441 ns | 858753 ns | 839591 ns | 871623 | |
canada/Encode_SonicDyn | 6078378 ns | 6152427 ns | 6009102 ns | 6074922 | |
github_events/Encode_SonicDyn | 21617 ns | 22432 ns | 21035 ns | 20380 | |
otfcc/Encode_SonicDyn | 222879330 ns | 155490041 ns | 159626245 ns | 160975575 | |
poet/Encode_SonicDyn | 864328 ns | 840780 ns | 720925 ns | 721330 | |
citm_catalog/Encode_SonicDyn | 600309 ns | 533967 ns | 560394 ns | 554867 | |
twitter/Encode_SonicDyn | 94764 ns | 97690 ns | 93807 ns | 89528 | |
twitterescaped/Encode_SonicDyn | 281830 ns | 284284 ns | 263186 ns | 269042 |
Performance
test case sse haswell dynamic dispatch book/Decode_SonicDyn 813 ns 849 ns 1128 ns gsoc-2018/Decode_SonicDyn 1470898 ns 1339296 ns 1752588 ns fgo/Decode_SonicDyn 112165070 ns 117719364 ns 150338769 ns lottie/Decode_SonicDyn 805143 ns 842756 ns 1187414 ns canada/Decode_SonicDyn 3756878 ns 3789520 ns 4085432 ns github_events/Decode_SonicDyn 39368 ns 39716 ns 54755 ns otfcc/Decode_SonicDyn 292141676 ns 320360184 ns 377427578 ns poet/Decode_SonicDyn 1572923 ns 1534339 ns 1743444 ns citm_catalog/Decode_SonicDyn 1137476 ns 1217241 ns 1439325 ns twitter/Decode_SonicDyn 181451 ns 185673 ns 260165 ns twitterescaped/Decode_SonicDyn 492546 ns 555098 ns 671564 ns book/Encode_SonicDyn 619 ns 631 ns 616 ns gsoc-2018/Encode_SonicDyn 796307 ns 680203 ns 672574 ns fgo/Encode_SonicDyn 75432301 ns 76930374 ns 75568517 ns lottie/Encode_SonicDyn 858753 ns 839591 ns 871623 ns canada/Encode_SonicDyn 6152427 ns 6009102 ns 6074922 ns github_events/Encode_SonicDyn 22432 ns 21035 ns 20380 ns otfcc/Encode_SonicDyn 155490041 ns 159626245 ns 160975575 ns poet/Encode_SonicDyn 840780 ns 720925 ns 721330 ns citm_catalog/Encode_SonicDyn 533967 ns 560394 ns 554867 ns twitter/Encode_SonicDyn 97690 ns 93807 ns 89528 ns twitterescaped/Encode_SonicDyn 284284 ns 263186 ns 269042 ns
最好分别贴下static 模式和 dynamic 模式下,目前分支和master分支的相对性能测试数据,这样应该更清楚一点
Performance test case sse haswell dynamic dispatch book/Decode_SonicDyn 813 ns 849 ns 1128 ns gsoc-2018/Decode_SonicDyn 1470898 ns 1339296 ns 1752588 ns fgo/Decode_SonicDyn 112165070 ns 117719364 ns 150338769 ns lottie/Decode_SonicDyn 805143 ns 842756 ns 1187414 ns canada/Decode_SonicDyn 3756878 ns 3789520 ns 4085432 ns github_events/Decode_SonicDyn 39368 ns 39716 ns 54755 ns otfcc/Decode_SonicDyn 292141676 ns 320360184 ns 377427578 ns poet/Decode_SonicDyn 1572923 ns 1534339 ns 1743444 ns citm_catalog/Decode_SonicDyn 1137476 ns 1217241 ns 1439325 ns twitter/Decode_SonicDyn 181451 ns 185673 ns 260165 ns twitterescaped/Decode_SonicDyn 492546 ns 555098 ns 671564 ns book/Encode_SonicDyn 619 ns 631 ns 616 ns gsoc-2018/Encode_SonicDyn 796307 ns 680203 ns 672574 ns fgo/Encode_SonicDyn 75432301 ns 76930374 ns 75568517 ns lottie/Encode_SonicDyn 858753 ns 839591 ns 871623 ns canada/Encode_SonicDyn 6152427 ns 6009102 ns 6074922 ns github_events/Encode_SonicDyn 22432 ns 21035 ns 20380 ns otfcc/Encode_SonicDyn 155490041 ns 159626245 ns 160975575 ns poet/Encode_SonicDyn 840780 ns 720925 ns 721330 ns citm_catalog/Encode_SonicDyn 533967 ns 560394 ns 554867 ns twitter/Encode_SonicDyn 97690 ns 93807 ns 89528 ns twitterescaped/Encode_SonicDyn 284284 ns 263186 ns 269042 ns
最好分别贴下static 模式和 dynamic 模式下,目前分支和master分支的相对性能测试数据,这样应该更清楚一点
Updated.
Main changes