ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
1.92k stars 64 forks source link

Fixed build issues with the shared libraries #136

Closed ashbob999 closed 3 months ago

ashbob999 commented 3 months ago

This PR fixes many build issues related to the shared libraries.

CMake Fixes/Changes:

Code changes:

TODO:

Questions:

ashvardanian commented 3 months ago

Thank you for patches, @ashbob999! You may find LibSee useful in tracking such things. I haven't properly released it yet, but it may still be handy 🤗

Currently we target sandybridge which gives use SSE4.1/POPCNT (no BMI), this means that serial is not actually serial. But changing it to an earlier CPU, would mean that we would lose POPCNT. So would these functions have to be dynamically dispatched as well?

Not sure. For 32-bit and 64-bit integers popcount SWAR variant is probably 4-6 cycles, right? If so, we can probably replace the intrinsic with SWAR on those.

Is there already an issue with tzcnt/lzcnt on processors without BMI, beacuse they will use bsf/bsr instead and they differ slightly?

Haven't seen such issues, but I probably lack the right kind of hardware. I am afraid of using hardware emulation for tests. Maybe wiser to use some old instance kinds on AWS? How do you catch those?

Should we enable AVX/AVX512 instructions for 32-bit?

Probably no need for that.

Should ARM also default to a lesser instruction set by default?

This is a big one. We may want to separate a few more levels of Arm, similar to how it's done for x86. What do you think a good set should look like?

ashbob999 commented 3 months ago

Not sure. For 32-bit and 64-bit integers popcount SWAR variant is probably 4-6 cycles, right? If so, we can probably replace the intrinsic with SWAR on those.

Could do, just thinking about the different ways its built, we could use the SWAR versions only in the serial functions, then this way the AVX versions can use the supported native instructions. Obviously this might mean that on certain targets we might be under-utilizing them. This would allow the dynamic libraries to hopefully run on any x86 hardware (unless there is other instructions that aren't supported). Also if we were using them in the serial functions only, we would have to make sure that it doesn't severely affect tits performance. Do you have examples of the 32/64 bit SWAR versions of popcount. Another thing to note, it that clang/gcc already fallback to a non-native version when the arch is before sandybridge (although I don't know how efficient these are, and what MSVC does).

Haven't seen such issues, but I probably lack the right kind of hardware. I am afraid of using hardware emulation for tests. Maybe wiser to use some old instance kinds on AWS? How do you catch those?

I agree, I guess some testing with manually replacing the tzcnt/lzcnt with bsf/bsr instructions could highlight any problems (would also depend where they are used, as to whether the difference between them can be seen).

Should we enable AVX/AVX512 instructions for 32-bit?

Probably no need for that.

Okay, so should this be reflected in the CMake by adding checks for 32/64 bit?

This is a big one. We may want to separate a few more levels of Arm, similar to how it's done for x86. What do you think a good set should look like?

I don't know much about the different ARM targets, and what each one supports. Because when I run the ARM neon_serial test on my phone, it still reports neon support, so armv8-a might still enable smid.

ashvardanian commented 3 months ago

Hey @ashbob999! Have you noticed that the Cross Compilation (arm64, aarch64-linux-gnu) fails, as well as the Build Python 310 for windows-latest for 64-bit Arm, but for a different reason. I was planning to work on the library on Saturday. Any chance you have a patch we can test and merge?

ashvardanian commented 3 months ago

Epic! I think it's time to merge!