ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.05k stars 66 forks source link

Initial support for Big Endian #75

Open SammyVimes opened 7 months ago

SammyVimes commented 7 months ago

Hi! Noticed that there is no support for Big Endian, so decided to start working on it. Currently I did:

(also fixed a little bug in the little endian version of sz_rfind_1char_swar).

I also took a liberty of adding googletest dependency (of course I can remove it, I just prefer using it) and supporting linux version of qsort_r in the test.cpp.

Although I am using macOS, I was able to test on a big endian machine using QEMU and docker. So I think the next thing to do for this pull request will be adding a workflow with the similar setup (or maybe GitHub has BE machines, I don't know yet).

With all those if (IS_LITTLE_ENDIAN) code looks funky, I know. I will try to do something with it

ashvardanian commented 7 months ago

Hi, @SammyVimes! Thank you for the PR! Big Endian support is a great thing to have, but we need to make a few changes to proceed with the PR.

  1. The main-dev branch is miles ahead, and is being prepared for a major release. Please use it as a reference branch.
  2. It is probably wiser to use macros for big/little endian checks.
  3. Let's avoid Google Test and other third-party utilities. They are very useful in the general case, but a careful use of assert-s makes more sense here, allowing us to conditionally log more info about the scope, hence simplifying debugging.
SammyVimes commented 7 months ago

Hi, @ashvardanian! Sure, will change the PR accordingly. I really don’t know how I managed to miss the main-dev branch 😅

ashvardanian commented 6 months ago

Hi @SammyVimes! Any chance you've made any progress on the new version? I'm installing Docker QEMU images now, to test on 32-bit and big-endian architectures to generalize StringZilla further. Can continue your efforts, if you have anything you can push 🤗

SammyVimes commented 2 months ago

@ashvardanian sorry, I was completely swamped by work. If big endian support is still required, I will happily pick up where I left off