Replace GFNI with nibble-based approach for character-set-search

ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖

Apache License 2.0

2.05k stars 66 forks source link

Solved in 54b5603f2042233baa367abe22b41dd1100457d9, but it may be possible to further reduce the latency for frequent patterns. For that we need a more efficient way to initialize the off and even columns of the bitmap, analogous to the LOAD2 instruction on Arm:

filter_even_vec.zmm = _mm512_broadcast_i32x4(_mm256_castsi256_si128(_mm256_maskz_compress_epi8(0x55555555, filter_ymm)));
filter_odd_vec.zmm = _mm512_broadcast_i32x4(_mm256_castsi256_si128(_mm256_maskz_compress_epi8(0xaaaaaaaa, filter_ymm)));

ashvardanian / StringZilla

Replace GFNI with nibble-based approach for character-set-search #76