ashvardanian / SimSIMD

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
797 stars 42 forks source link

simsimd_avx512_i8_cos under vnni #91

Closed my-vegetable-has-exploded closed 4 months ago

my-vegetable-has-exploded commented 4 months ago

Thanks for your great work!

When i reading codes in simsimd_avx512_i8_cos, https://github.com/ashvardanian/SimSIMD/blob/main/include/simsimd/spatial.h#L1120 I am a little confusing about that both input a and b is signed, but _mm512_dpbusd_epi32 is product of ZeroExtend16 and SignedExtend16 , I think there maybe some problem when vector a contain negative number like -1.

I think may it need to add more codes like this. https://github.com/my-vegetable-has-exploded/dot-bench/blob/main/src/lib.rs#L54-L57

ashvardanian commented 4 months ago

Great catch, thank you too, @my-vegetable-has-exploded! Would you have time to submit a patch?

my-vegetable-has-exploded commented 4 months ago

Great catch, thank you too, @my-vegetable-has-exploded! Would you have time to submit a patch?

I may need some time to familiarize myself with the project. Maybe several days later.

ashvardanian commented 4 months ago

@my-vegetable-has-exploded, much appreciated! The CONTRIBUTING.md contents may help. The very first command should be enough to compile 🤗

my-vegetable-has-exploded commented 4 months ago

@my-vegetable-has-exploded, much appreciated! The CONTRIBUTING.md contents may help. The very first command should be enough to compile 🤗

Sorry, I meet some problems. The situation of my codes is computing dot product of int8 after quantization, I can promise that all numbers fall in [-127,127] in this situation. So it is okay to convert -x*y to x*-y and use dpbusd (product of unsigned and signed) to compute the result for me. But if without this gurantee, 0 - (-128) = 128 would overflow.

ashvardanian commented 4 months ago

@my-vegetable-has-exploded not sure about what you mean, but your Rust snippet looked reasonable. I think adjusting the C variant in this repo to work the same way is the way to go 🤷‍♂️

my-vegetable-has-exploded commented 4 months ago

@my-vegetable-has-exploded not sure about what you mean, but your Rust snippet looked reasonable. I think adjusting the C variant in this repo to work the same way is the way to go 🤷‍♂️

For example, we have a=[1,-2,3,-4] and b=[11,12,13,14]. We would convert it to a'=[1,2,3,4], b'=[11,-12,13,-14]. But if b=[-128,-128,-128,-128], than b'=[128,128,128,128], 128 would overflow in 8bit presentation.

ashvardanian commented 4 months ago

@my-vegetable-has-exploded I think the best route is to upcast to 16-but and use dpwssd.

ashvardanian commented 4 months ago

:tada: This issue has been resolved in version 3.9.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: