simsimd_avx512_i8_cos under vnni

ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

https://ashvardanian.com/posts/simsimd-faster-scipy/

Apache License 2.0

999 stars 59 forks source link

simsimd_avx512_i8_cos under vnni #91

Closed my-vegetable-has-exploded closed 8 months ago

my-vegetable-has-exploded commented 9 months ago

Thanks for your great work!

When i reading codes in simsimd_avx512_i8_cos, https://github.com/ashvardanian/SimSIMD/blob/main/include/simsimd/spatial.h#L1120 I am a little confusing about that both input a and b is signed, but _mm512_dpbusd_epi32 is product of ZeroExtend16 and SignedExtend16 , I think there maybe some problem when vector a contain negative number like -1.

I think may it need to add more codes like this. https://github.com/my-vegetable-has-exploded/dot-bench/blob/main/src/lib.rs#L54-L57

ashvardanian commented 9 months ago

Great catch, thank you too, @my-vegetable-has-exploded! Would you have time to submit a patch?

my-vegetable-has-exploded commented 9 months ago

Great catch, thank you too, @my-vegetable-has-exploded! Would you have time to submit a patch?

I may need some time to familiarize myself with the project. Maybe several days later.

ashvardanian commented 8 months ago

@my-vegetable-has-exploded, much appreciated! The CONTRIBUTING.md contents may help. The very first command should be enough to compile 🤗

my-vegetable-has-exploded commented 8 months ago

@my-vegetable-has-exploded, much appreciated! The CONTRIBUTING.md contents may help. The very first command should be enough to compile 🤗

Sorry, I meet some problems. The situation of my codes is computing dot product of int8 after quantization, I can promise that all numbers fall in [-127,127] in this situation. So it is okay to convert -x*y to x*-y and use dpbusd (product of unsigned and signed) to compute the result for me. But if without this gurantee, 0 - (-128) = 128 would overflow.

ashvardanian commented 8 months ago

@my-vegetable-has-exploded not sure about what you mean, but your Rust snippet looked reasonable. I think adjusting the C variant in this repo to work the same way is the way to go 🤷‍♂️

my-vegetable-has-exploded commented 8 months ago

@my-vegetable-has-exploded not sure about what you mean, but your Rust snippet looked reasonable. I think adjusting the C variant in this repo to work the same way is the way to go 🤷‍♂️

For example, we have a=[1,-2,3,-4] and b=[11,12,13,14]. We would convert it to a'=[1,2,3,4], b'=[11,-12,13,-14]. But if b=[-128,-128,-128,-128], than b'=[128,128,128,128], 128 would overflow in 8bit presentation.

ashvardanian commented 8 months ago

@my-vegetable-has-exploded I think the best route is to upcast to 16-but and use dpwssd.

ashvardanian commented 8 months ago

:tada: This issue has been resolved in version 3.9.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: