BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.06k stars 346 forks source link

Add `-mfpu=neon` if $CC accepts it #358

Closed rui314 closed 11 months ago

rui314 commented 11 months ago

This commit fixes a build issue on Raspberry Pi 4A, which uses an arm32 userland with an arm64 kernel.

GCC generally requires this flag to use NEON intrinsics on arm32. On arm64, GCC doesn't recognize this flag, so -mfpu=neon won't be added to the command line.

oconnor663 commented 11 months ago

@BurningEnlightenment could you review the CMake change?

BurningEnlightenment commented 11 months ago

No, this is the wrong way to go about this. Not all arm32 CPUs support NEON therefore compiler support doesn't imply that the intended target indeed supports NEON (and since ARM doesn't support runtime CPU feature detection, we can't switch the implementation like we do for x86).

If you, the user, know that your target CPU supports NEON you ought to configure this project with -DBLAKE3_USE_NEON_INTRINSICS=ON -DBLAKE3_CFLAGS_NEON=-mfpu=neon or set these values in your CMake toolchain file.

rui314 commented 11 months ago

In this code path, we've already set BLAKE3_USE_NEON to 1 (please refer to the lines just above this change), so that shouldn't be the case, right? This patch adds the compiler flag only when we decided to use NEON intrinsics.

BurningEnlightenment commented 11 months ago

The condition for non ARMv8 architectures includes the check DEFINED BLAKE3_CFLAGS_NEON (l. 132). The whole if-block is for the case where we are certain that NEON is supported, wanted and we do know how to configure the compiler with it.

EDIT: See #314 for some context on the current design.

BurningEnlightenment commented 11 months ago

@rui314 do you have a use case where using a toolchain / commandline param is infeasible?

rui314 commented 11 months ago

The issue I'm addressing in this pull request is that BLAKE3's C implementation cannot be built out-of-the-box on the default Raspberry Pi OS (32-bit) running on a Raspberry Pi 4A, which is equipped with an ARMv8-A 64-bit processor. So the kernel is 64-bit and the userland is 32-bit. I didn't specify any optional parameters when using CMake (I simply ran cmake ../c followed by make), but the default build fails with the following error:

In file included from /home/ruiu/blake3/c/blake3_neon.c:3:
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h: In function 'add_128':
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h:651:1: error: inlining failed in call to 'always_inline' 'vaddq_u32': target specific option mismatch
  651 | vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
      | ^~~~~~~~~
/home/ruiu/blake3/c/blake3_neon.c:24:10: note: called from here
   24 |   return vaddq_u32(a, b);
      |          ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/blake3.dir/build.make:118: CMakeFiles/blake3.dir/blake3_neon.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/blake3.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

I encountered the same error even with cmake -DBLAKE3_USE_NEON_INTRINSICS=0 ../c.

I thought that the correct solution is to add the appropriate option to make GCC recognize NEON intrinsics when they are enabled in CMake. Unlike ARM64 GCC, ARM32 GCC needs -mfpu=neon to recognize NEON intrinsics.

BurningEnlightenment commented 11 months ago

Ah, I see. The CMAKE_SYSTEM_PROCESSOR check currently bypasses all other checks. So we may need to restructure the condition a bit and provide a default flag configuration for the ARMv8 32bit case. Can you provide the CMakeCache.txt generated in the cmake ../c case? (I dislike check_c_compiler_flag and try to avoid it due to the fact that they can lead to false positive detections with other compilers)

rui314 commented 11 months ago

Here you are: https://gist.github.com/rui314/a9f7c9f7f5dda1cc432875c289557cab

BurningEnlightenment commented 11 months ago

@rui314 can you try the fix in #359?

rui314 commented 11 months ago

It worked!

BurningEnlightenment commented 11 months ago

@oconnor663 looks like we arrived at a solution--you may close this and review/merge #359

oconnor663 commented 11 months ago

Merged #359.

BurningEnlightenment commented 11 months ago

@oconnor663 I've reviewed two other CMake PRs which you may merge.

EDIT: Here is the list of reviewed PRs: https://github.com/BLAKE3-team/BLAKE3/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved