ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
1.92k stars 64 forks source link

sz_capabilities might be incorrect for AVX512 #137

Closed ashbob999 closed 3 months ago

ashbob999 commented 3 months ago

The sz_capabilities function seems to return incorrect values for certain AVX512 instruction sets.

lscpu gives:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       36 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz
Stepping:            3
CPU MHz:             2496.000
CPU max MHz:         2496.0000
BogoMIPS:            4992.00
Hypervisor vendor:   Windows Subsystem for Linux
Virtualization type: container
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
                        sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 est tm2 ssse3 fma cx16 xtpr pdcm
                        pcid sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand hypervisor lahf_lm abm
                        3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap
                        clflushopt ibrs ibpb stibp ssbd

But the python stringzilla.__capabilities__ gives:

'serial,avx2,avx512vbmi,gfni,'

If I understand cpuid correctly, info1 refers to calling cpuid with eax==1 (From https://en.wikipedia.org/wiki/CPUID). Then the following code in sz_capabilites is incorrect. https://github.com/ashvardanian/StringZilla/blob/1a4b05d8d6b7bc617b8e857d36b5dc1869685a7d/c/lib.c#L78-L83

So the info1s should instead be info7s. As currently:

ashvardanian commented 3 months ago

Yes, @ashbob999! You may be right. If you have a patch, will be happy to test on my end as well 🤗

ashbob999 commented 3 months ago

I haven't made a patch yet, but it should be simply enough to do.

Just replace the 3 info1s with info7s.

I can create a PR with the patch, and test it myself, but it won't be till the weekend.

ashvardanian commented 3 months ago

Yes, that fixes the issue! It would be great if we had similar CPU-capability checks for Arm.

ashvardanian commented 3 months ago

:tada: This issue has been resolved in version 3.8.2 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: