It looks like the L2 distance computation with AVX is broken. If the dimensionality is not divisible by 16 and the remainder is less than 8, it will read coordinates past the end of a vector. There are other corner cases how the remaining coordinates (not processed with SIMD) are not treated correctly. A simple way to fix this is to pad each vector, but the arguably better approach is to process the residuals without SIMD.
I'll also open a pull request with the fixed implementation for AVX-256, in case you accept pull requests. The SSE version is also broken, but it looks like you took that one out already.
Hi,
It looks like the L2 distance computation with AVX is broken. If the dimensionality is not divisible by 16 and the remainder is less than 8, it will read coordinates past the end of a vector. There are other corner cases how the remaining coordinates (not processed with SIMD) are not treated correctly. A simple way to fix this is to pad each vector, but the arguably better approach is to process the residuals without SIMD.
Here's a minimal example to reproduce the bug, alongside a fixed implementation https://gist.github.com/larsgottesbueren/5816a5cea312cfbf45577a3da317996f
I'll also open a pull request with the fixed implementation for AVX-256, in case you accept pull requests. The SSE version is also broken, but it looks like you took that one out already.
Best, Lars