Sparse Distances - Githubissues

All existing metrics imply dense vector representations. Dealing with very high-dimensional vectors, sparse representations may provide huge space-efficiency gains.

The only operation that needs to be implemented for Jaccard, Hamming, Inner Product, L2, and Cosine is a float-weighted vectorized set-intersection. We may expect the following kinds of vectors:

u16 - high priority
u32 - high priority
u16f16 - medium priority
u32f16 - medium priority
u32f32 - low priority?

The last may not be practically useful. AVX-512 backend (Intel Ice Lake and newer and AMD Genoa) and SVE (AWS Graviton, Nvidia Grace, Microsoft Cobalt) will see the biggest gains. Together with a serial backend, multiplied by 4-5 input types, and 5 distance functions, this may result in over 100 new kernels.

Any thoughts and recommendations? Someone else looking for this functionality?

ashvardanian / SimSIMD

Sparse Distances #100