Closed cbornet closed 2 months ago
It may be coming from the reciprocal square-root optimizations in the last step of cosine computation. Frankly, we don't have a good analysis tool to detect those regressions between versions. Do you have any ideas on how to track those and control acceptable error?
For me that was detected by a test in LangChain that uses numpy.allclose()
Is it checked without passing atol/rtol arguments?
Yes, with defaults.
So do we have to accept the precision loss or will there be changes to improve it ?
@cbornet can you please also share the output of this:
python -c "import simsimd; print(simsimd.get_capabilities())"
If it prints haswell
, the issue is likely coming from here - the rsqrt
. But I am not sure if it's a good idea to replace it with a more accurate solution. If so, we need to change it in every cosine distance kernel in spatial.h
.
Sure. Here is the result:
{'serial': True, 'neon': False, 'sve': False, 'neon_f16': False, 'sve_f16': False, 'neon_bf16': False, 'sve_bf16': False, 'neon_i8': False, 'sve_i8': False, 'haswell': True, 'skylake': False, 'ice': False, 'genoa': False, 'sapphire': False}
with the haswell
on.
That makes a lot of sense, thank you @cbornet!
@MattPD suggested several reading materials on Twitter:
I love that some of those materials explicitly target GPUs. I was thinking about putting together a mixed-precision math hackathon; all those links look like excellent preparation materials. Still, I must figure out when to allocate time for them properly. I've previously tried to mix such precision-calibration issues with other intra-day tasks, which was clearly not enough to grok them.
Help and recommendations are more than welcome!
Is it expected ? On my machine, the code
prints
5.960464477539063e-08
on v4.4.0 and0.00036910176277160645
on v5.0.0