ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
884 stars 46 forks source link

Precision reduced between v4 and v5 #153

Closed cbornet closed 6 days ago

cbornet commented 1 month ago

Is it expected ? On my machine, the code

import numpy as np
import simsimd as simd
X = np.array([1.0, 2.0, 3.0], dtype=np.float32)
print(simd.cosine(X, X))

prints 5.960464477539063e-08 on v4.4.0 and 0.00036910176277160645 on v5.0.0

ashvardanian commented 1 month ago

It may be coming from the reciprocal square-root optimizations in the last step of cosine computation. Frankly, we don't have a good analysis tool to detect those regressions between versions. Do you have any ideas on how to track those and control acceptable error?

cbornet commented 1 month ago

For me that was detected by a test in LangChain that uses numpy.allclose()

ashvardanian commented 1 month ago

Is it checked without passing atol/rtol arguments?

cbornet commented 1 month ago

Yes, with defaults.

cbornet commented 1 month ago

So do we have to accept the precision loss or will there be changes to improve it ?

ashvardanian commented 4 weeks ago

@cbornet can you please also share the output of this:

python -c "import simsimd; print(simsimd.get_capabilities())"

If it prints haswell, the issue is likely coming from here - the rsqrt. But I am not sure if it's a good idea to replace it with a more accurate solution. If so, we need to change it in every cosine distance kernel in spatial.h.

cbornet commented 4 weeks ago

Sure. Here is the result:

{'serial': True, 'neon': False, 'sve': False, 'neon_f16': False, 'sve_f16': False, 'neon_bf16': False, 'sve_bf16': False, 'neon_i8': False, 'sve_i8': False, 'haswell': True, 'skylake': False, 'ice': False, 'genoa': False, 'sapphire': False}

with the haswell on.

ashvardanian commented 4 weeks ago

That makes a lot of sense, thank you @cbornet!

ashvardanian commented 4 weeks ago

@MattPD suggested several reading materials on Twitter:

I love that some of those materials explicitly target GPUs. I was thinking about putting together a mixed-precision math hackathon; all those links look like excellent preparation materials. Still, I must figure out when to allocate time for them properly. I've previously tried to mix such precision-calibration issues with other intra-day tasks, which was clearly not enough to grok them.

Help and recommendations are more than welcome!