ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc πŸ¦–
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.05k stars 66 forks source link

feat: port randomize and sz_generate to Rust #104

Closed grouville closed 6 months ago

grouville commented 6 months ago

Context

Tries to bring the sz_generate and randomize to Rust

Design of the function implementation

As seen on the toml, you only rely on core::ffi lib, with no allocation, as well as no_std

Constraints encountered

On both your C and CPP implementations, you have two definitions:

This draft is only based on the in-place implementaiton as it allows me to avoid the allocation question: as you are inno_std, std::Vec is not possible, so the allocation of the returned string / vec would rely on the alloc crate ? Do you have any opinion on that, or did I miss something ?

Also, I encountered a mutability issue around the traits, to ship the PR as a draft, I rely on this MutableStringZilla trait, but the aim would be to unify them. I tried, but some of the distance functions fail, need to dig a bit more.

My tests currently rely on a Vec, not a String, as the String doesn't comply with the AsMut<N>. In this case, to test the string, we would need to rely on the implementation returning a string or a vec

Current issue

I am having an issue on the linker, which doesn't find the

ld: Undefined symbols:
            _sz_generate, referenced from:

I believe this is due to the fact that sz_generate is solely implemented in the .h, making the FFI not find the function implementation ? I've been blocked tonight on it, any guidance would be helpful πŸ™ Instead of relying just on sz_generate, shall I rely on the underlying functions and implement it directly in Rust ?

Nonetheless, please tell me if I am totally not getting it ? πŸ™ The aim is to iterate on the PR until the implementation

grouville commented 6 months ago

Sorry for the dump as a draft πŸ™ I preferred to share the work and dump the current thought process instead of remaining blocked on my own πŸ˜‡

ashvardanian commented 6 months ago

It's very good, @grouville, thanks a lot! I also prefer such workflow πŸ€—

I will replicate the generator in the C to solve the missing symbol issue and it should work fine after that πŸ˜‰

ashvardanian commented 6 months ago

Closing this PR, already merged without the GitHub GUI πŸ˜„ Thanks again, @grouville !