Closed codahale closed 1 year ago
Welp, didn’t realize that vreinterpretq_u64_u32
was nightly-only. I guess feel free to revive this when/if stable SIMD lands. ☹️
no idea how one runs GitHub Actions on a aarch64/NEON platform. QEMU?
As it were we have another PR open to add ARM support to the keccak
crate which covers a bunch of this: https://github.com/RustCrypto/sponges/pull/23
You can use cross
. Here's a example: https://github.com/RustCrypto/block-ciphers/blob/4334b85/.github/workflows/aes.yml#L222-L249
Unfortunately there's no M1-specific solution until this lands: https://github.com/github/roadmap/issues/528
Welp, didn’t realize that vreinterpretq_u64_u32 was nightly-only.
FWIW we'd be fine with a nightly-only feature. We have similar nightly-only ARM features in the aes
and polyval
crates which wrap ARMv8 hardware intrinsics supported by M1s.
I guess feel free to revive this when/if stable SIMD lands.
Sure looking forward to the eventual stabilization of core::simd
and the day we can have a portable SIMD implementation of ChaCha.
FWIW, this code compiles and passes tests on my M1 Air using stable-aarch64-apple-darwin
1.64.0. I don’t know what to make of that. Any suggestions?
Also regarding this specifically:
...
vreinterpretq_u64_u32
was nightly-only.
There are various stable workarounds, such as using transmute
, pointer casts, or core::slice::from_raw_parts
.
I'm guessing that's not the only nightly-only NEON intrinsics support you need, though.
this code compiles and passes tests on my M1 Air using stable-aarch64-apple-darwin 1.64.0
Maybe all the intrinsics you need are stabilized on 1.64?
If that's the case, you can just feature-gate support to prevent MSRV breakages.
It'd be good to know the actual MSRV of the feature.
Ok, looks like vreinterpretq_u64_u32
landed in 1.61. I re-added the neon
feature to handle the MSRV issue and added target_feature
gates for the backend.
Enabled CI (using the existing but commented-out cross
matrix item) and updated the README.
I think this is good to go. Let me know if anything else needs changing.
Cool, will try to review this week sometime
This otherwise seems like a fairly straightforward port of chacha_simd
and I'm fine to merge it if we can figure out how it should be gated.
Going to merge this and follow up on how it should be gated
Could you make a new release with this PR?
Unfortunately I haven’t added the gating I suggested which should be implemented prior to a release
I ported the NEON implementation of ChaCha from Crypto++ (public domain) to Rust with aarch64 intrinsics for a significant performance boost.
Observed performance changes on an Apple M1 Air:
Closes #287.
I’m not entirely certain I’ve got the flag/feature/cpuid token stuff right, and would appreciate any guidance about that. At this point the best I’ve got is that it very definitely works on my machine, an M1 Air. Also, no idea how one runs GitHub Actions on a aarch64/NEON platform. QEMU?