ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
1.92k stars 64 forks source link

search for string without loading entire file into memory? #112

Closed tooptoop4 closed 4 months ago

tooptoop4 commented 4 months ago

is there a cli version to check if the ascii nul character exists in a file or not? could stop searching after finding first occurrence. ideally without trying to read the entire file into memory

ashvardanian commented 4 months ago

That should be very easy to implement, @tooptoop4. Is there a similar utility in Linux?

The implementation would look something like:

from stringzilla import File, Str
Str(File(path)).contains(chr(0))

I will memory-map the file without loading it into RAM and will scan until the first occurrence. Feel free to suggest a better implementation and open a Pull Request 🤗