ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
913 stars 51 forks source link

AVX2 popcount implementation #69

Closed Charlyo closed 7 months ago

Charlyo commented 8 months ago

Right now for x86 it seems there's only serial and AVX-512 popcount hamming implementation.

Could you also implement AVX2 based one? Can be found here.

Thank you very much.

ashvardanian commented 8 months ago

Yes, that can be done. I am a bit overloaded the next week. Any chance you can open a PR for this?

Looking at the replies in the thread, Wojciech's variant looks great. We just need to adjust the style, remove loop unrolling, and add references to the original.

Charlyo commented 8 months ago

I'm not proficient at C or C++. Would rather let someone more experienced to do the job (if that's ok).

jianshu93 commented 8 months ago

Hello Both,

I believe libpopcnt.h has all AVX implementations of popcount: https://github.com/kimwalisch/libpopcnt

There is not need to implement an additional one. However, I think include it into this library can be useful.

Jianshu

ashvardanian commented 8 months ago

It's a good idea to add popcount, and libpopcnt looks nice, but we only need one routine for AVX2 Harley Seal transform. Would be easier to add those few lines of C code, than to add the first dependency update all of CI. Coincidently, ClickHouse and other users have expressed interest in bit-level operations, so I'm definitely open to PRs 🤗

ashvardanian commented 7 months ago

:tada: This issue has been resolved in version 3.9.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: