Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
The new Jaccard distance implementation went from 22 to 30 GB/s on Graviton 4. Hamming reaches 31 GB/s.
The same trick was also used for SVE kernels, but they turn out to be slower with 128-bit registers on Graviton 4.
The new Jaccard distance implementation went from 22 to 30 GB/s on Graviton 4. Hamming reaches 31 GB/s. The same trick was also used for SVE kernels, but they turn out to be slower with 128-bit registers on Graviton 4.