Closed jmfriedt closed 2 years ago
This is VOLK's interaction with cpu_features
:
static int i_can_has_sse4_1 (void) {
#if defined(CPU_FEATURES_ARCH_X86)
if (GetX86Info().features.sse4_1 == 0){ return 0; }
#endif
return 1;
}
I would expect that the output of list_cpu_features
contains: sse,sse2,sse3,sse4_1,sse4_2,ssse3
but in your example, it lacks sse3
(not to be confused with ssse3
).
What's volk-config-info --avail-machines
and --all-machines
reporting?
If VOLK determines SSE4 to be available, it should look like this:
generic;sse2_64_mmx;sse3_64_mmx;ssse3_64_mmx;sse4_1_64_mmx;sse4_2_64_mmx;
Otherwise, it might stop at sse2_*
.
What's the result of cat /proc/cpuinfo | grep flags
?
Thank you for your reply. Here are the outputs
$ cat /proc/cpuinfo | grep flags
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
so that seeing sse4_1 and sse4_2 made me believe these SIMD instructions would be supported. From VOLK:
$ ./apps/volk-config-info --all-machines
generic;sse2_64_mmx;sse3_64_mmx;ssse3_64_mmx;sse4_a_64_mmx;sse4_1_64_mmx;sse4_2_64_mmx;avx_64_mmx;avx2_64_mmx;avx512f_64_mmx;avx512cd_64_mmx
$ ./apps/volk-config-info --avail-machines
generic;sse2_64_mmx;
This CPU is advertised as supporting all SSE if I am to believe https://www.cpu-world.com/CPUs/Celeron/Intel-Celeron%20J1900.html
It is quite surprising that ssse3
seems to be available but sse3
is supposed to be missing. At least VOLK requires SSE3 to be available to consider SSE4. Since this set of extensions is reported missing, it seems like all kernels beyond SSE2 are unavailable. This is a very interesting specialty of that particular CPU. There might be an issue somewhere.
/proc/cpuinfo fails to show sse3 but shows sse3 as pni (https://bugzilla.redhat.com/show_bug.cgi?id=491817) as does
$ inxi -Fza | grep sse
Flags: ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
I fear this detection https://github.com/google/cpu_features/blob/main/src/impl_x86_linux_or_android.c#L47 is incorrect and should replace sse3 with pni from the above correction. A PR has been posted accordingly to https://github.com/google/cpu_features.
fixed in cpu_features with https://github.com/google/cpu_features/commit/40e1c7158ddfbdae477751948750e0121aba55a1
Thanks for the info! I guess, all we can do is update the submodule pointer and point people to this issue. After all, I expect others to hit the same issue with a system installed cpu_features.
I have seen that list_cpu_features has been updated to detect SSE>2 SIMD extensions even when the processor is lacking AVX support. The current issue is for a single board computer [1] architectured around an Intel(R) Celeron(R) CPU J1900 as stated with
Manually compiling VOLK on this platform and running volk_profile leads to e.g.
which sound suspicious since kernels/volk/volk_32u_popcntpuppet_32u.h indicates
Indeed on a desktop computer fitted with an AVX supporting CPU, we observe that
Could it be that on an AVX-less CPU (in this case Atom on an embedded board) the SSE4 support is correctly detected but not used in volk_profile method selection?
The function has been included in the dynamic library linking
but does not seem to be called when selecting the optimum method.
Thanks.
[1] http://advdownload.advantech.com/productfile/Downloadfile1/1-T36L2E/MIO-5251_USER_MANUAL_ED-1_FINAL.PDF (Celeron version)