Precision reduced between v4 and v5

ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

https://ashvardanian.com/posts/simsimd-faster-scipy/

Apache License 2.0

988 stars 59 forks source link

Precision reduced between v4 and v5 #153

Closed cbornet closed 2 months ago

cbornet commented 3 months ago

Is it expected ? On my machine, the code

import numpy as np
import simsimd as simd
X = np.array([1.0, 2.0, 3.0], dtype=np.float32)
print(simd.cosine(X, X))

prints 5.960464477539063e-08 on v4.4.0 and 0.00036910176277160645 on v5.0.0

ashvardanian commented 3 months ago

It may be coming from the reciprocal square-root optimizations in the last step of cosine computation. Frankly, we don't have a good analysis tool to detect those regressions between versions. Do you have any ideas on how to track those and control acceptable error?

cbornet commented 3 months ago

For me that was detected by a test in LangChain that uses numpy.allclose()

ashvardanian commented 3 months ago

Is it checked without passing atol/rtol arguments?

cbornet commented 3 months ago

Yes, with defaults.

cbornet commented 3 months ago

So do we have to accept the precision loss or will there be changes to improve it ?

ashvardanian commented 3 months ago

@cbornet can you please also share the output of this:

python -c "import simsimd; print(simsimd.get_capabilities())"

If it prints haswell, the issue is likely coming from here - the rsqrt. But I am not sure if it's a good idea to replace it with a more accurate solution. If so, we need to change it in every cosine distance kernel in spatial.h.

cbornet commented 3 months ago

Sure. Here is the result:

{'serial': True, 'neon': False, 'sve': False, 'neon_f16': False, 'sve_f16': False, 'neon_bf16': False, 'sve_bf16': False, 'neon_i8': False, 'sve_i8': False, 'haswell': True, 'skylake': False, 'ice': False, 'genoa': False, 'sapphire': False}

with the haswell on.

ashvardanian commented 3 months ago

That makes a lot of sense, thank you @cbornet!

ashvardanian commented 3 months ago

@MattPD suggested several reading materials on Twitter:

The CORE-MATH project. link
Precision and Performance Analysis of C Standard Math Library Functions on GPUs. link
Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision link
Correctly-rounded evaluation of a function: why, how, and at what cost? link
RLIBM: Rutgers Architecture and Programming Languages Lab's Correctly Rounded Libm link

I love that some of those materials explicitly target GPUs. I was thinking about putting together a mixed-precision math hackathon; all those links look like excellent preparation materials. Still, I must figure out when to allocate time for them properly. I've previously tried to mix such precision-calibration issues with other intra-day tasks, which was clearly not enough to grok them.

Help and recommendations are more than welcome!