ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
984 stars 59 forks source link

Bug: Cosine distance may be negative #195

Closed fancidev closed 1 month ago

fancidev commented 1 month ago

Describe the bug

For input vectors that are highly co-linear, their cosine distance may be computed to be negative due to numerical error.

Steps to reproduce

import simsimd
import numpy as np

u = np.array([-0.30039746, -0.13594460, 0.58292344])
v = np.array([-0.65563949, -0.29700866, 1.27146813])
print(simsimd.cosine(u, v))

Expected behavior

Cosine distance should be between 0 and 2. But I don't know if returning a negative result is a material issue in practice.

43 describes a similar issue related to numerical inaccuracy of the cosine distance. (Probably due to the use of RSQRT.)

A possible workaround is to clip the result within $[0,2]$, but that may have a negative impact on the performance (which could be more significant for short vectors are increasing negligible for longer vectors).

SimSIMD version

v5.4.3

Operating System

macOS Sonoma

Hardware architecture

Arm

Which interface are you using?

Python bindings

Contact Details

No response

Are you open to being tagged as a contributor?

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 1 month ago

Thank you, @fancidev, good catch! We've resolved many of the related issues in v5.4.0, but clipping will be a good addition to avoid negative values.