BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.06k stars 346 forks source link

runtime neon detection #383

Open divinity76 opened 7 months ago

divinity76 commented 7 months ago

tested on Oracle Cloud's cheapest ARM VPS VM.Standard.A1.Flex, seems to work. related documentation: https://developer.arm.com/documentation/100403/0200/register-descriptions/aarch64-system-registers/id-aa64pfr0-el1--aarch64-processor-feature-register-0--el1

AdvSIMD, [23:20]
    Advanced SIMD. The possible values are:
    0x1 Advanced SIMD, including Half-precision support, is implemented.

seems the ARM marketing department calls it "NEON" and the ARM engineering department calls it "AdvSIMD".

possible alternative for https://github.com/BLAKE3-team/BLAKE3/pull/382

sneves commented 7 months ago

If you go to the generic AArch64 manual instead of the Cortex-A75 one, a value of 0 means that AdvSIMD is present, a value of 1 means AdvSIMD+FP16 is present, and a value of 15 means no AdvSIMD. I was under the impression that AdvSIMD was a required feature of AArch64. Go figure.

divinity76 commented 7 months ago

Edit: Yes, it should be != 15, thanks! added.

@sneves hmm... does that mean

    uint64_t id_aa64pfr0_el1;
    __asm__ ("mrs %0, ID_AA64PFR0_EL1" : "=r" (id_aa64pfr0_el1));
    const uint8_t AdvSIMD = (id_aa64pfr0_el1 >> 20) & (1<<0 | 1<<1 | 1<<2 | 1 << 3);
    if(AdvSIMD != 15) {
      features = ARM_NEON;
    } else {
      features = 0;
    }

is better?

BurningEnlightenment commented 7 months ago

As far as I know standard ARMv8 implementations are required to support NEON, so apart from some rare special purpose CPUs NEON is guaranteed to be available. I heavily doubt that such a special purpose CPU would be used to run off-the-mill binaries / these people would compile with NEON=off anyways. Therefore it looks a whole lot like checking for SSE2 on x86-64 to me.

jblazquez commented 5 months ago

Note that the code in this PR will need minor changes now that #389 has been merged.

Details here: https://github.com/BLAKE3-team/BLAKE3/pull/389#issuecomment-2041255514