buddy-compiler / buddy-benchmark

Benchmark Framework for Buddy Projects
Apache License 2.0
44 stars 34 forks source link

Splitting size not configured correctly for other platforms than X86 in Image Processing Benchmark #35

Closed SForeKeeper closed 2 years ago

SForeKeeper commented 2 years ago

Describe the bug In file benchmarks/ImageProcessing/CMakeLists.txt it detects extensions using commands in cmake/check-simd.cmake to set the splitting size. It only supports X86 SIMD extensions, having no default value, it could pass an empty string to following build commands and leads to compilation failure.

To Reproduce Build the benchmark on targets other than X86. AArch64 for instance.

Expected behavior For different platforms, different SIMD extensions should be checked and used to set the splitting size. A default value should be provided, even a target is unknown it could provide some vectorization to speed things up. Otherwise, warns the user that it doesn't support current target and throws an error.

Code

if (${BUDDY_OPT_STRIP_MINING})
  set(SPLITING_SIZE ${BUDDY_OPT_STRIP_MINING} CACHE STRING "Spliting Size")
elseif(HAVE_AVX512)
  set(SPLITING_SIZE 256 CACHE STRING "Spliting Size")
elseif(HAVE_AVX2)
  set(SPLITING_SIZE 128 CACHE STRING "Spliting Size")
elseif(HAVE_SSE)
  set(SPLITING_SIZE 64 CACHE STRING "Spliting Size")
endif()

The correct spelling would be splitting, by the way.

Desktop (please complete the following information):

Additional context Benchmark random3x3KernelAlign on Image YuTu1024.png have shown following results on Apple M1 Max. Other benchmarks have similar outcomes and don't vary much with different splitting sizes. (Neon instructions use 128-bit registers)

image
meshtag commented 2 years ago

@SForeKeeper , thanks for reporting the issue! Can you please specify the exact stride size used by you to generate the attached benchmark result.

zhanghb97 commented 2 years ago

Nice catch, @SForeKeeper ! This feature is called "auto-config", we want to detect the hardware at compile-time and give the proper configuration to the lowering passes. We are still in very early stages, as you can see, it only supports Intel processor detection now. Feel free to send a PR to support the Apple Silicon (Arm Neon).

meshtag commented 2 years ago

Can we close this issue now @SForeKeeper ?