ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.05k stars 66 forks source link

Fix: some bugs in assignment, initialization, ... #63

Closed kmapb closed 8 months ago

kmapb commented 8 months ago

Introducing some C++ unit tests uncovered a few bugs:

This change lightly refactors to make more of the above testable, tests them, and fixes the problems that surfaced. Code inspection also shows some memory leaks and other issues, but Rome wasn't built in a day and the patch is getting big as-is.

kmapb commented 8 months ago

Will look into failing pytest ...

ashvardanian commented 8 months ago

Hi, @kmapb! Just saw that you wanted to look into the Python tests. They are already passing, I've just pushed the code. Now I'm in the processes of merging our changes. Will push them with a working GitHub CI.

Next up - checking which C++ interfaces are missing to start integrating into Chromium or other major C++ project, as planned 🤗

kmapb commented 8 months ago

Looks like pytest has been broken for a while. After fixing the typing and pytest annotation issues, we're failing one of the asserts in sz_size_log2i in the guts of the sz_sort_introsort. We're about 8 frames deep in the hardware stack in an explicit recursion here. Let's see if I can figure out what went wrong; please hold the review for now in case it's something substantial.

kmapb commented 8 months ago

Ah, yes, we call it with zero, which is undefined behavior. (0 has no logarithm in any base).

kmapb commented 8 months ago

OK, now the majority of the test_fuzzy_substring cases fail. Repushed with fixes for sort, retrying.

kmapb commented 8 months ago

It's trying to execute a function named choice introduced in c9aad691. No such function is present in that revision of the file, nor in HEAD. Assuming I need to reintroduce from random import choice

ashvardanian commented 8 months ago

Yes, that's the random choice function. I am surprised the Python tests are failing for you. Maybe I need to shuffle better 🤔

image

ashvardanian commented 8 months ago

Can I already sync our versions on main-dev or do you want to push something else in this PR, @kmapb ?

kmapb commented 8 months ago

I think I'm done now! Just squashed enough changes to get pytest running again.

kmapb commented 8 months ago

Phew! Let's see what the automated tests think. If they're green, I think we're ready for review here.