Closed GoWind closed 4 days ago
Here is the numerical error report:
+---+---------------+------+---------+---------------------+---------------------+---------------------+---------------------+---------------------+-----------------+
| | Metric | NDim | DType | Baseline Error | SimSIMD Error | Accurate Duration | Baseline Duration | SimSIMD Duration | SimSIMD Speedup |
+---+---------------+------+---------+---------------------+---------------------+---------------------+---------------------+---------------------+-----------------+
| 0 | jensenshannon | 11 | float16 | 3.03e-04 ± 2.79e-04 | 8.61e-02 ± 1.53e-02 | 1.91e+04 ± 1.58e+03 | 1.93e+04 ± 1.36e+03 | 1.16e+03 ± 4.91e+02 | 17.32x ± 2.11x |
| 1 | jensenshannon | 11 | float32 | 4.47e-08 ± 4.07e-08 | 7.44e-02 ± 3.15e-02 | 1.95e+04 ± 2.48e+03 | 1.76e+04 ± 3.68e+03 | 1.18e+03 ± 8.24e+02 | 16.39x ± 3.24x |
| 2 | jensenshannon | 97 | float16 | 1.08e-04 ± 8.37e-05 | 9.11e-02 ± 4.34e-03 | 2.14e+04 ± 1.82e+03 | 2.33e+04 ± 1.58e+03 | 1.41e+03 ± 3.29e+02 | 16.72x ± 1.60x |
| 3 | jensenshannon | 97 | float32 | 3.07e-08 ± 2.18e-08 | 7.99e-02 ± 3.03e-02 | 2.15e+04 ± 1.93e+03 | 1.88e+04 ± 1.17e+03 | 1.20e+03 ± 5.35e+02 | 16.31x ± 1.89x |
| 4 | jensenshannon | 1536 | float16 | 8.90e-05 ± 6.03e-05 | 9.22e-02 ± 9.60e-04 | 5.09e+04 ± 2.73e+03 | 9.97e+04 ± 3.40e+03 | 6.08e+03 ± 5.49e+02 | 16.47x ± 0.89x |
| 5 | jensenshannon | 1536 | float32 | 2.40e-08 ± 1.81e-08 | 8.07e-02 ± 3.03e-02 | 5.07e+04 ± 3.08e+03 | 4.74e+04 ± 1.22e+04 | 3.56e+03 ± 7.67e+02 | 13.51x ± 2.89x |
+---+---------------+------+---------+---------------------+---------------------+---------------------+---------------------+---------------------+-----------------+
Hi, @GoWind! It seems like you didn't take care of the x86 code. It's also not recommended to use the SIMSIMD_SQRT
in the vectorized code. There are native approximations for those.
Hi @ashvardanian , I am working on a follow up PR for the intel versions, will submit them in a day or so.
Regd, the SIMSIMD_SQRT
marcro, i see that there is a *result = _simsimd_sqrt_f64_neon(sum)
that returns an f64_t (which simsimd_distance_t
is a typedef for). Can I use them in the js_f32_neon and js_16_neon implementations because *result is of type simsimd_distance_t anyway ?
Can I use them in the js_f32_neon and js_16_neon implementations because *result is of type simsimd_distance_t anyway ?
Yes, @GoWind. I've already patched that part and will probably merge x86 patches soon as well. Would you like to look into #187 or the simpler #202, #203, or #228?
@ashvardanian , glad to :) #187 is marked as closed, does it still need a fix ?
I will try to fix 202 or 203 as well and then see how I can contribute to the kernel fn documentation for C :)
My mistake, meant #184 🤗
will try to get a PR for that as well !
The Jensen Shannom measure implementation in SimSIMD is different from the implementation in scipy. It turns out that a scaling factory of 0.5 was missed and this fix seems to match the example provided in scipy documentation.
Matches the examples provided in the scipy documentation