ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
988 stars 59 forks source link

feat: #184 Expose intersections rust #238

Closed GoWind closed 1 day ago

GoWind commented 2 days ago

Expose the intersect method, similar to numpy's intersect1d that gives the count of intersection between 2 integer vectors.

ashvardanian commented 1 day ago

Looks good! I think we should also add tests and examples to the docstrings.

Also, a while ago I wrote a small toolkit to compare the operations I frequently use in StringZilla against memchr crate. Would be great to extend with a TF-IDF implementation in vanilla Rust vs the one accelerated with those operations in SimSIMD. This is more of a creative challenge, than some of the other issues in this repo, but let me know if you'd like to try such a challenge?

GoWind commented 1 day ago

Looks good! I think we should also add tests and examples to the docstrings.

Will update it. Is the trait name Sparse okay, or do you have a better one in mind ?

Also, a while ago I wrote a small toolkit to compare the operations I frequently use in StringZilla against memchr crate. Would be great to extend with a TF-IDF implementation in vanilla Rust vs the one accelerated with those operations in SimSIMD. This is more of a creative challenge, than some of the other issues in this repo, but let me know if you'd like to try such a challenge?

Would love to ! I was looking for actual problems + datasets to test and improve my SIMD knowledge. This looks like a great exercise. TF-IDF is the term frequency - inverse document frequency measure, right ?

ashvardanian commented 1 day ago

Yes and yes!