Closed eirnym closed 7 months ago
Please understand that current implementation only supports AVX2 and SSE2, therefore it is impossible to enable by default, as there is no NEON implementation
Now for matter of default in general NEON cannot be assumed to be default in general, but I believe all mac OS chips do so, so in theory I could assume that, but only for Mac OS.
Problem is that when I started this library NEON support in Rust's std was lacking and I'm not sure if they filled gaps yet to implement it I will try to take a look again later
Most of features supported by LLVM has been implemented. Remaining unsupported features has not been implemented in LLVM as far as I understood the thread.
Documentation also describes many neon instructions, some of them available since Rust 1.59.0
https://doc.rust-lang.org/core/arch/arm/index.html https://doc.rust-lang.org/core/arch/aarch64/index.html
@eirnym Can you please give me output of rustc --print cfg
on your M1 laptop?
I'm curious if Neon is enabled by default on Mac
If so you can try to test my branch https://github.com/DoumanAsh/xxhash-rust/pull/35
I have macOS M2 laptop:
$ rustc --print cfg
debug_assertions
panic="unwind"
target_arch="aarch64"
target_endian="little"
target_env=""
target_family="unix"
target_feature="aes"
target_feature="crc"
target_feature="dit"
target_feature="dotprod"
target_feature="dpb"
target_feature="dpb2"
target_feature="fcma"
target_feature="fhm"
target_feature="flagm"
target_feature="fp16"
target_feature="frintts"
target_feature="jsconv"
target_feature="lor"
target_feature="lse"
target_feature="neon"
target_feature="paca"
target_feature="pacg"
target_feature="pan"
target_feature="pmuv3"
target_feature="ras"
target_feature="rcpc"
target_feature="rcpc2"
target_feature="rdm"
target_feature="sb"
target_feature="sha2"
target_feature="sha3"
target_feature="ssbs"
target_feature="vh"
target_has_atomic="128"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_os="macos"
target_pointer_width="64"
target_vendor="apple"
unix
my test:
Cargo.toml:
[package]
name = "public-id"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
base64 = "0.21.7"
uuid = { version = "1.7.0", features = ["v4", "v7", "v8"] }
#xxhash-rust = { version = "0.8.8", features = ["xxh3"] }
xxhash-rust = { git="https://github.com/DoumanAsh/xxhash-rust.git", branch="neon", features = ["xxh3"] }
src/main.rs:
use base64::{engine::general_purpose::URL_SAFE, Engine as _};
fn main() {
let v: u64 = xxhash_rust::xxh3::xxh3_64(uuid::Uuid::new_v4().as_bytes());
let b64 = URL_SAFE.encode(v.to_le_bytes());
println!("Hello, world! {}", b64);
}
both apps (with and without neon optimizations) are compiled with --release
, Cargo.lock is removed and fd xxhash-rust . -x rm -rf
is run in ~/.cargo
hyperfine output:
$ hyperfine --warmup 1000 -N -u microsecond './public-id-neon-optimizations' ./public-id-no-optimizations
Benchmark 1: ./public-id-neon-optimizations
Time (mean ± σ): 728.7 µs ± 16.9 µs [User: 356.3 µs, System: 186.6 µs]
Range (min … max): 697.4 µs … 1069.2 µs 4060 runs
Benchmark 2: ./public-id-no-optimizations
Time (mean ± σ): 724.8 µs ± 15.2 µs [User: 355.4 µs, System: 184.2 µs]
Range (min … max): 692.9 µs … 920.6 µs 4129 runs
Summary
./public-id-no-optimizations ran
1.01 ± 0.03 times faster than ./public-id-neon-optimizations
Well it is good that Mac has Neon enabled by default I will merge and release new version later
stats for 256Mb of random data:
hyperfine --warmup 1000 -N -u microsecond './public-id-neon-optimizations' ./public-id-no-optimizations
Benchmark 1: ./public-id-neon-optimizations
Time (mean ± σ): 66061.1 µs ± 1809.4 µs [User: 14959.7 µs, System: 50626.2 µs]
Range (min … max): 63642.4 µs … 73034.2 µs 44 runs
Benchmark 2: ./public-id-no-optimizations
Time (mean ± σ): 75613.7 µs ± 7321.5 µs [User: 22832.6 µs, System: 51530.6 µs]
Range (min … max): 70870.0 µs … 115103.5 µs 41 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
./public-id-neon-optimizations ran
1.14 ± 0.12 times faster than ./public-id-no-optimizations
Release 0.8.9 with Neon
Could you please enable optimizations for macbooks by default as you've did for x86_64 CPUs