ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
988 stars 59 forks source link

Fix: Refs #207 Fix implementation of Jensen Shannon measure #233

Closed GoWind closed 4 days ago

GoWind commented 6 days ago

The Jensen Shannom measure implementation in SimSIMD is different from the implementation in scipy. It turns out that a scaling factory of 0.5 was missed and this fix seems to match the example provided in scipy documentation.

Matches the examples provided in the scipy documentation

ashvardanian commented 4 days ago

Here is the numerical error report:

+---+---------------+------+---------+---------------------+---------------------+---------------------+---------------------+---------------------+-----------------+
|   |    Metric     | NDim |  DType  |   Baseline Error    |    SimSIMD Error    |  Accurate Duration  |  Baseline Duration  |  SimSIMD Duration   | SimSIMD Speedup |
+---+---------------+------+---------+---------------------+---------------------+---------------------+---------------------+---------------------+-----------------+
| 0 | jensenshannon |  11  | float16 | 3.03e-04 ± 2.79e-04 | 8.61e-02 ± 1.53e-02 | 1.91e+04 ± 1.58e+03 | 1.93e+04 ± 1.36e+03 | 1.16e+03 ± 4.91e+02 | 17.32x ± 2.11x  |
| 1 | jensenshannon |  11  | float32 | 4.47e-08 ± 4.07e-08 | 7.44e-02 ± 3.15e-02 | 1.95e+04 ± 2.48e+03 | 1.76e+04 ± 3.68e+03 | 1.18e+03 ± 8.24e+02 | 16.39x ± 3.24x  |
| 2 | jensenshannon |  97  | float16 | 1.08e-04 ± 8.37e-05 | 9.11e-02 ± 4.34e-03 | 2.14e+04 ± 1.82e+03 | 2.33e+04 ± 1.58e+03 | 1.41e+03 ± 3.29e+02 | 16.72x ± 1.60x  |
| 3 | jensenshannon |  97  | float32 | 3.07e-08 ± 2.18e-08 | 7.99e-02 ± 3.03e-02 | 2.15e+04 ± 1.93e+03 | 1.88e+04 ± 1.17e+03 | 1.20e+03 ± 5.35e+02 | 16.31x ± 1.89x  |
| 4 | jensenshannon | 1536 | float16 | 8.90e-05 ± 6.03e-05 | 9.22e-02 ± 9.60e-04 | 5.09e+04 ± 2.73e+03 | 9.97e+04 ± 3.40e+03 | 6.08e+03 ± 5.49e+02 | 16.47x ± 0.89x  |
| 5 | jensenshannon | 1536 | float32 | 2.40e-08 ± 1.81e-08 | 8.07e-02 ± 3.03e-02 | 5.07e+04 ± 3.08e+03 | 4.74e+04 ± 1.22e+04 | 3.56e+03 ± 7.67e+02 | 13.51x ± 2.89x  |
+---+---------------+------+---------+---------------------+---------------------+---------------------+---------------------+---------------------+-----------------+
ashvardanian commented 4 days ago

Hi, @GoWind! It seems like you didn't take care of the x86 code. It's also not recommended to use the SIMSIMD_SQRT in the vectorized code. There are native approximations for those.

GoWind commented 4 days ago

Hi @ashvardanian , I am working on a follow up PR for the intel versions, will submit them in a day or so.

Regd, the SIMSIMD_SQRT marcro, i see that there is a *result = _simsimd_sqrt_f64_neon(sum) that returns an f64_t (which simsimd_distance_t is a typedef for). Can I use them in the js_f32_neon and js_16_neon implementations because *result is of type simsimd_distance_t anyway ?

ashvardanian commented 4 days ago

Can I use them in the js_f32_neon and js_16_neon implementations because *result is of type simsimd_distance_t anyway ?

Yes, @GoWind. I've already patched that part and will probably merge x86 patches soon as well. Would you like to look into #187 or the simpler #202, #203, or #228?

GoWind commented 4 days ago

@ashvardanian , glad to :) #187 is marked as closed, does it still need a fix ?

I will try to fix 202 or 203 as well and then see how I can contribute to the kernel fn documentation for C :)

ashvardanian commented 4 days ago

My mistake, meant #184 🤗

GoWind commented 4 days ago

will try to get a PR for that as well !