Closed georges-arm closed 3 weeks ago
Thank you for attempting to clean up the compilation mess.
I just looked into it and realized that the ARM-SIMD code cannot work without having TARGET_SIMD_X86
enabled, because it explicitly calls some X86-SIMD implementations (xCalcHAD*_SSE
for non-square block sizes). This is something that came in after the first NEON implementations, and I did not know this when I said it was supposed to work in my comment on #453. I checked back with colleagues and we can define that TARGET_SIMD_X86
is a requirement for TARGET_SIMD_ARM
.
So, even with your changes applied, the code does not compile when TARGET_SIMD_ARM
is enabled, but TARGET_SIMD_X86
is not. Maybe we should explicitly catch that in the configure step.
Thank you for attempting to clean up the compilation mess.
I just looked into it and realized that the ARM-SIMD code cannot work without having
TARGET_SIMD_X86
enabled, because it explicitly calls some X86-SIMD implementations (xCalcHAD*_SSE
for non-square block sizes). This is something that came in after the first NEON implementations, and I did not know this when I said it was supposed to work in my comment on #453. I checked back with colleagues and we can define thatTARGET_SIMD_X86
is a requirement forTARGET_SIMD_ARM
.So, even with your changes applied, the code does not compile when
TARGET_SIMD_ARM
is enabled, butTARGET_SIMD_X86
is not. Maybe we should explicitly catch that in the configure step.
Thanks for pointing out the problems with the xCalcHAD
functions, I now realise that CMake cache variables meant that my existing build directories wasn't actually picking up my attempts to recompile with the SIMDe kernels disabled.
I think it is still possible to avoid the dependency between the two options: we can simply disable the RdCost optimisations if TARGET_SIMD_X86
is not enabled. This is similar to how we already require AArch64 for these kernels to be used. I've put up a patch to that effect (and swapped the existing __ARM_ARCH >= 8
check for REAL_TARGET_AARCH64
now that we have it). Let me know what you think about this approach!
I also have a patch to remove the AArch64-only requirement for the RdCost kernels such that it also compiles on 32-bit Arm platforms, but I will put that up as a separate PR once this is merged so it doesn't conflict.
There are existing options to enable both Arm native and SIMDe-based intrinsics implementations, however if only one of these is enabled then compilation for Arm targets currently fails.
Fix this by adjusting the existing
#ifdef
guards and adding new ones to cover the previously failing option combinations. Also stop checkingREAL_TARGET_ARM
since it does not distinguish between which SIMD targets are enabled.Additionally the existing Neon code in RdCostARM.h depends on being able to call into the existing SIMDe-based x86 kernels. If
VVENC_ENABLE_X86_SIMD
is disabled in the CMake then these kernels are unavailable and compilation fails. Until native Neon kernels are added for the missing functions, simply disable the optimised RdCost Neon code unless the SIMDe kernels are also available.