Closed SForeKeeper closed 2 years ago
@SForeKeeper , thanks for reporting the issue!
Can you please specify the exact stride size
used by you to generate the attached benchmark result.
Nice catch, @SForeKeeper ! This feature is called "auto-config", we want to detect the hardware at compile-time and give the proper configuration to the lowering passes. We are still in very early stages, as you can see, it only supports Intel processor detection now. Feel free to send a PR to support the Apple Silicon (Arm Neon).
Can we close this issue now @SForeKeeper ?
Describe the bug In file
benchmarks/ImageProcessing/CMakeLists.txt
it detects extensions using commands incmake/check-simd.cmake
to set the splitting size. It only supports X86 SIMD extensions, having no default value, it could pass an empty string to following build commands and leads to compilation failure.To Reproduce Build the benchmark on targets other than X86. AArch64 for instance.
Expected behavior For different platforms, different SIMD extensions should be checked and used to set the splitting size. A default value should be provided, even a target is unknown it could provide some vectorization to speed things up. Otherwise, warns the user that it doesn't support current target and throws an error.
Code
The correct spelling would be splitting, by the way.
Desktop (please complete the following information):
Additional context Benchmark
random3x3KernelAlign
on ImageYuTu1024.png
have shown following results on Apple M1 Max. Other benchmarks have similar outcomes and don't vary much with different splitting sizes. (Neon instructions use 128-bit registers)