Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
Both Intel and Apple now have specialized AMX tiled matrix multiplication extensions. Both are tricky to use, but may result in substantial performance improvements. Potentially even for single vector dot-products and cosine distances.
Both Intel and Apple now have specialized AMX tiled matrix multiplication extensions. Both are tricky to use, but may result in substantial performance improvements. Potentially even for single vector dot-products and cosine distances.
Resources: