Closed GoWind closed 1 day ago
Looks good! I think we should also add tests and examples to the docstrings.
Also, a while ago I wrote a small toolkit to compare the operations I frequently use in StringZilla against memchr crate. Would be great to extend with a TF-IDF implementation in vanilla Rust vs the one accelerated with those operations in SimSIMD. This is more of a creative challenge, than some of the other issues in this repo, but let me know if you'd like to try such a challenge?
Looks good! I think we should also add tests and examples to the docstrings.
Will update it. Is the trait name Sparse
okay, or do you have a better one in mind ?
Also, a while ago I wrote a small toolkit to compare the operations I frequently use in StringZilla against memchr crate. Would be great to extend with a TF-IDF implementation in vanilla Rust vs the one accelerated with those operations in SimSIMD. This is more of a creative challenge, than some of the other issues in this repo, but let me know if you'd like to try such a challenge?
Would love to ! I was looking for actual problems + datasets to test and improve my SIMD knowledge. This looks like a great exercise. TF-IDF is the term frequency - inverse document frequency
measure, right ?
Yes and yes!
Expose the
intersect
method, similar to numpy'sintersect1d
that gives the count of intersection between 2 integer vectors.