aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
865 stars 162 forks source link

bug: codec_choose_x86 does not check for OS AVX512 support #113

Closed mayeut closed 2 months ago

mayeut commented 1 year ago

The check for OS support has not been updated & still only checks for AVX support. AVX512 added masks, 512-bit register size & 16 new registers. There are 3 corresponding bits that need to be checked in XCR0.

aklomp commented 1 year ago

I don't have any hardware that supports AVX512, and CI also does not support it. The only way I can run the codec is under an emulator (Intel SDE). Otherwise I would have found this, and probably also added an optimized inline asm implementation.

Will fix.

mayeut commented 1 year ago

I use SDE as well (in CI also) to check for some "check SIMD capability" code. Having the hardware wouldn't have helped in this case. I don't know of a tool that can help for this one. I had hoped SDE could clear some XCR0 bits on demand to emulate lack of OS support but it seems it can't.

lucshi commented 1 year ago

The check for OS support has not been updated & still only checks for AVX support. AVX512 added masks, 512-bit register size & 16 new registers. There are 3 corresponding bits that need to be checked in XCR0.

Hi Matthieu, May I know which OS configuration may break this detection? I ever tested this code on HW (Icelake avx512 + ubuntu) and saw the detection was OK.

mgaudet commented 2 months ago

A failure can be reproduced by booting linux on a machine with AVX-512 disabled explicitly.

I have reproduced this on an AMD Ryzen Threadripper PRO 7975WX. For compatibility with https://pernos.co/, I've explicitly disabled AVX-512 by adding clearcpuid=304 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub

As you can see, AVX512 isn't listed in the /proc/cpuinfo:

flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi umip pku ospke rdpid overflow_recov succor smca fsrm flush_l1d debug_swap
bugs        : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso

I see a sigill even running make inside of test right now:

base64/test$ make 
rm -f benchmark test_base64 *.o
cc -std=c99 -O3 -Wall -Wextra -pedantic -DBASE64_STATIC_DEFINE -o codec_supported.o -c codec_supported.c
cc -std=c99 -O3 -Wall -Wextra -pedantic -DBASE64_STATIC_DEFINE -o test_base64 test_base64.c codec_supported.o ../lib/libbase64.o
cc -std=c99 -O3 -Wall -Wextra -pedantic -DBASE64_STATIC_DEFINE -o benchmark benchmark.c codec_supported.o ../lib/libbase64.o -lrt
./test_base64
Codec AVX2:
  skipping
Codec NEON32:
  skipping
Codec NEON64:
  skipping
Codec plain:
  all tests passed.
Codec SSSE3:
  skipping
Codec SSE41:
  skipping
Codec SSE42:
  skipping
Codec AVX:
  skipping
Codec AVX512:
make: *** [Makefile:19: test] Illegal instruction (core dumped)

See Also: https://github.com/nodejs/node/issues/53426 https://github.com/microsoft/vscode/issues/214630

lucshi commented 2 months ago

A failure can be reproduced by booting linux on a machine with AVX-512 disabled explicitly.

I have reproduced this on an AMD Ryzen Threadripper PRO 7975WX. For compatibility with https://pernos.co/, I've explicitly disabled AVX-512 by adding clearcpuid=304 to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub

As you can see, AVX512 isn't listed in the /proc/cpuinfo:

flags     : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi umip pku ospke rdpid overflow_recov succor smca fsrm flush_l1d debug_swap
bugs      : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso

I see a sigill even running make inside of test right now:

base64/test$ make 
rm -f benchmark test_base64 *.o
cc -std=c99 -O3 -Wall -Wextra -pedantic -DBASE64_STATIC_DEFINE -o codec_supported.o -c codec_supported.c
cc -std=c99 -O3 -Wall -Wextra -pedantic -DBASE64_STATIC_DEFINE -o test_base64 test_base64.c codec_supported.o ../lib/libbase64.o
cc -std=c99 -O3 -Wall -Wextra -pedantic -DBASE64_STATIC_DEFINE -o benchmark benchmark.c codec_supported.o ../lib/libbase64.o -lrt
./test_base64
Codec AVX2:
  skipping
Codec NEON32:
  skipping
Codec NEON64:
  skipping
Codec plain:
  all tests passed.
Codec SSSE3:
  skipping
Codec SSE41:
  skipping
Codec SSE42:
  skipping
Codec AVX:
  skipping
Codec AVX512:
make: *** [Makefile:19: test] Illegal instruction (core dumped)

See Also: nodejs/node#53426 microsoft/vscode#214630

I think the AVX checks by cpuid (current mechnism) can not be affected by kernel configuration but BIOs configuration. That's why even if /proc shows no AVX the cpuid can still see the AVX in CPU.