Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
To increase the accuracy of rsqrt square-root reciprocals this patch adds a Newton-Raphson iteration after the native instruction. Looking at the 1536-dimensional cosine distance computation on x86, on smaller vectors the double-precision multiplication may result in a 30% performance reduction:
To increase the accuracy of
rsqrt
square-root reciprocals this patch adds a Newton-Raphson iteration after the native instruction. Looking at the 1536-dimensional cosine distance computation on x86, on smaller vectors the double-precision multiplication may result in a 30% performance reduction:On Arm:
Benchmarks
x86: Intel Sapphire Rapids
Baseline
With 1 Iteration
Arm: AWS Graviton 3
Baseline
With 2 Iterations