Double-precision improvements on Arm

ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

https://ashvardanian.com/posts/simsimd-faster-scipy/

Apache License 2.0

988 stars 59 forks source link

Double-precision improvements on Arm #198

Closed ashvardanian closed 1 month ago

ashvardanian commented 1 month ago

The new f64 kernels don't benefit much from NEON, given the small 128-bit register size, but instead leverage the rsqrt approximations already used in SimSIMD for lower-precision inputs.