ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.05k stars 66 forks source link

Fix: python slices of splits used incorrect offsets. #67

Closed kmapb closed 7 months ago

kmapb commented 7 months ago

32-bit and 64-bit offset logic assumed a Strs produced from by slicing a previous Strs started at offset 0. str.split().slice([1:]), for instance, produced impossible end offsets. This was the cause of some SEGVs and negative-length Python string constructions.

Deduplicate the 32-bit and 64-bit logic, and fix the bug in the single copy of the logic.

Test for 32-bit and 64-bit indices. Previous version of lib.c caused this test to SEGV or produce runtime errors about constructing Python strings of negative length depending on memory conditions.

kmapb commented 7 months ago

Against current main-dev

ashvardanian commented 7 months ago

Didn't realize the PR contains a patch until today, sorry, will merge soon 🤗