Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
All existing metrics imply dense vector representations. Dealing with very high-dimensional vectors, sparse representations may provide huge space-efficiency gains.
The only operation that needs to be implemented for Jaccard, Hamming, Inner Product, L2, and Cosine is a float-weighted vectorized set-intersection. We may expect the following kinds of vectors:
u16 - high priority
u32 - high priority
u16f16 - medium priority
u32f16 - medium priority
u32f32 - low priority?
The last may not be practically useful. AVX-512 backend (Intel Ice Lake and newer and AMD Genoa) and SVE (AWS Graviton, Nvidia Grace, Microsoft Cobalt) will see the biggest gains. Together with a serial backend, multiplied by 4-5 input types, and 5 distance functions, this may result in over 100 new kernels.
Any thoughts and recommendations? Someone else looking for this functionality?
All existing metrics imply dense vector representations. Dealing with very high-dimensional vectors, sparse representations may provide huge space-efficiency gains.
The only operation that needs to be implemented for Jaccard, Hamming, Inner Product, L2, and Cosine is a float-weighted vectorized set-intersection. We may expect the following kinds of vectors:
The last may not be practically useful. AVX-512 backend (Intel Ice Lake and newer and AMD Genoa) and SVE (AWS Graviton, Nvidia Grace, Microsoft Cobalt) will see the biggest gains. Together with a serial backend, multiplied by 4-5 input types, and 5 distance functions, this may result in over 100 new kernels.
Any thoughts and recommendations? Someone else looking for this functionality?