ashvardanian / SimSIMD

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
806 stars 42 forks source link

Inconsistency with scipy #43

Closed guyrosin closed 7 months ago

guyrosin commented 7 months ago

Results are sometimes different than scipy.spatial.distance. For example: (I checked using float64 and float32)

import numpy as np
import simsimd
from scipy.spatial import distance

a1 = np.array([0.10, 0.62])
a2 = np.array([0.16, 0.69])

print(simsimd.cosine(a1, a2))
print(distance.cosine(a1, a2))

print(simsimd.sqeuclidean(a1, a2))
print(distance.sqeuclidean(a1, a2))

Output:

0.0023124534636735916
0.0023073006911024097

0.008500000461935997
0.008499999999999994
ashvardanian commented 7 months ago

Indeed, @guyrosin, the inconsistency exists. It comes naturally with the change in the order of arithmetic operations.

Any floating-point arithmetic accumulates error. The order of the operations will affects its sign and magnitude. For 99.9% of apps it shouldn't be a big deal, but if you are calculating ballistic trajectories - better use advanced math libraries with 128-bit floats and error-reduction techniques.

guyrosin commented 7 months ago

Got it, thanks for the explanation @ashvardanian! (I expected the differences to be smaller)