fraunhoferhhi / vvenc

VVenC, the Fraunhofer Versatile Video Encoder
https://www.hhi.fraunhofer.de/en/departments/vca/technologies-and-solutions/h266-vvc.html
BSD 3-Clause Clear License
966 stars 173 forks source link

Rework setSIMDExtension to compile with partial SIMD enabled #455

Closed georges-arm closed 3 weeks ago

georges-arm commented 3 weeks ago

There are existing options to enable both Arm native and SIMDe-based intrinsics implementations, however if only one of these is enabled then compilation for Arm targets currently fails.

Fix this by adjusting the existing #ifdef guards and adding new ones to cover the previously failing option combinations. Also stop checking REAL_TARGET_ARM since it does not distinguish between which SIMD targets are enabled.

Additionally the existing Neon code in RdCostARM.h depends on being able to call into the existing SIMDe-based x86 kernels. If VVENC_ENABLE_X86_SIMD is disabled in the CMake then these kernels are unavailable and compilation fails. Until native Neon kernels are added for the missing functions, simply disable the optimised RdCost Neon code unless the SIMDe kernels are also available.

K-os commented 3 weeks ago

Thank you for attempting to clean up the compilation mess.

I just looked into it and realized that the ARM-SIMD code cannot work without having TARGET_SIMD_X86 enabled, because it explicitly calls some X86-SIMD implementations (xCalcHAD*_SSE for non-square block sizes). This is something that came in after the first NEON implementations, and I did not know this when I said it was supposed to work in my comment on #453. I checked back with colleagues and we can define that TARGET_SIMD_X86 is a requirement for TARGET_SIMD_ARM.

So, even with your changes applied, the code does not compile when TARGET_SIMD_ARM is enabled, but TARGET_SIMD_X86 is not. Maybe we should explicitly catch that in the configure step.

georges-arm commented 3 weeks ago

Thank you for attempting to clean up the compilation mess.

I just looked into it and realized that the ARM-SIMD code cannot work without having TARGET_SIMD_X86 enabled, because it explicitly calls some X86-SIMD implementations (xCalcHAD*_SSE for non-square block sizes). This is something that came in after the first NEON implementations, and I did not know this when I said it was supposed to work in my comment on #453. I checked back with colleagues and we can define that TARGET_SIMD_X86 is a requirement for TARGET_SIMD_ARM.

So, even with your changes applied, the code does not compile when TARGET_SIMD_ARM is enabled, but TARGET_SIMD_X86 is not. Maybe we should explicitly catch that in the configure step.

Thanks for pointing out the problems with the xCalcHAD functions, I now realise that CMake cache variables meant that my existing build directories wasn't actually picking up my attempts to recompile with the SIMDe kernels disabled.

I think it is still possible to avoid the dependency between the two options: we can simply disable the RdCost optimisations if TARGET_SIMD_X86 is not enabled. This is similar to how we already require AArch64 for these kernels to be used. I've put up a patch to that effect (and swapped the existing __ARM_ARCH >= 8 check for REAL_TARGET_AARCH64 now that we have it). Let me know what you think about this approach!

I also have a patch to remove the AArch64-only requirement for the RdCost kernels such that it also compiles on 32-bit Arm platforms, but I will put that up as a separate PR once this is merged so it doesn't conflict.