ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
2.05k stars 66 forks source link

Replace GFNI with nibble-based approach for character-set-search #76

Closed ashvardanian closed 4 months ago

ashvardanian commented 7 months ago

Current AVX-512 implementation relies on rare Galois Field extensions, that are not very fast to compute. @WojciechMula has a great article on a nibble-based approach for "SIMD byte lookup" with SSE. That same approach should work well with Arm NEON, AVX2, and AVX-512, and should be considered in the future.

ashvardanian commented 5 months ago

Solved in 54b5603f2042233baa367abe22b41dd1100457d9, but it may be possible to further reduce the latency for frequent patterns. For that we need a more efficient way to initialize the off and even columns of the bitmap, analogous to the LOAD2 instruction on Arm:

filter_even_vec.zmm = _mm512_broadcast_i32x4(_mm256_castsi256_si128(_mm256_maskz_compress_epi8(0x55555555, filter_ymm)));
filter_odd_vec.zmm = _mm512_broadcast_i32x4(_mm256_castsi256_si128(_mm256_maskz_compress_epi8(0xaaaaaaaa, filter_ymm)));