BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.02k stars 446 forks source link

Ryzen 7950x lacks feature sse2 #5122

Closed ahorek closed 1 year ago

ahorek commented 1 year ago

Describe the bug Einstein@Home tasks are being rejected with an error [version] CPU [' family 25 model 97 stepping 2 '] lacks feature ' sse2 '

28.02.2023 17:45:16 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi u as you can see the list is truncated because the internal buffer isn't large enough, but I'm not sure it's the reason for the original error...

cat /proc/cpuinfo

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 97
model name      : AMD Ryzen 9 7950X 16-Core Processor
stepping        : 2
microcode       : 0xa601203
cpu MHz         : 4499.798
cache size      : 1024 KB
physical id     : 0
siblings        : 32
core id         : 0
cpu cores       : 16
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 8999.57
TLB size        : 3584 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

I manually tested cpuid feature check and it also says sse2 IS supported.

Steps To Reproduce add a project Einstein@Home and enable Multi-Directional Gravitational Wave search on O3 (CPU)

Expected behavior it doesn't accept work due to a missing feature sse2 which is supported on this CPU (no VMs or hacks are involved). Other boinc projects work just fine.

System Information

Additional context see related code https://github.com/BOINC/boinc/blob/master/lib/hostinfo.h#L53 https://github.com/BOINC/boinc/blob/master/client/hostinfo_unix.cpp#L530 https://github.com/BOINC/boinc/blob/master/client/hostinfo_unix.cpp#L683

and the original report https://einsteinathome.org/cs/content/ryzen-7950x-lacks-feature-sse2

let me know if there's more info / logs or tests I can provide. Thanks!

AenBleidd commented 1 year ago

Message [version] CPU [' family 25 model 97 stepping 2 '] lacks feature ' sse2 ' is different from the message that could be sent from the vanilla code: https://github.com/BOINC/boinc/blob/master/sched/plan_class_spec.cpp#L278 That makes me think that E@H runs customized version of this utility, thus it can't be verified and/or fixed from our side

davidpanderson commented 1 year ago

The vanilla code has a 1024-char buffer for CPU features, and I think it's being exceeded here. I'll change it to std::string.

ahorek commented 1 year ago

thanks @davidpanderson for looking into it, I've tried your branch, but it didn't help:

2023-03-03 01:27:35.7167 [PID=2377505]   Request: [USER#xxxxx] [HOST#13128375] [IP xxx.xxx.xxx.226] client 7.23.0
2023-03-03 01:27:35.7336 [PID=2377505] [debug]   have_master:0 have_working: 1 have_db: 1
2023-03-03 01:27:35.7336 [PID=2377505] [debug]   using working prefs
2023-03-03 01:27:35.7336 [PID=2377505] [debug]   have db 1; dbmod 1665796207.000000; global mod 1665796207.000000
2023-03-03 01:27:35.7336 [PID=2377505] [debug]   sending db prefs in reply
2023-03-03 01:27:35.7337 [PID=2377505]    [send] effective_ncpus 32 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2023-03-03 01:27:35.7337 [PID=2377505]    [send] effective_ngpus 0 max_jobs_on_host_gpu 999999
2023-03-03 01:27:35.7337 [PID=2377505]    [send] Not using matchmaker scheduling; Not using EDF sim
2023-03-03 01:27:35.7337 [PID=2377505]    [send] CPU: req 1382400.00 sec, 32.00 instances; est delay 0.00
2023-03-03 01:27:35.7337 [PID=2377505]    [send] work_req_seconds: 1382400.00 secs
2023-03-03 01:27:35.7337 [PID=2377505]    [send] available disk 9.99 GB, work_buf_min 17280
2023-03-03 01:27:35.7338 [PID=2377505]    [send] active_frac 1.000000 on_frac 0.999772 DCF 1.000000
2023-03-03 01:27:35.7345 [PID=2377505]    [mixed] sending non-locality work first (0.9112)
2023-03-03 01:27:35.7731 [PID=2377505]    [send] [HOST#13128375] will accept beta work.  Scanning for beta work.
2023-03-03 01:27:35.7824 [PID=2377505]    [mixed] sending locality work second
2023-03-03 01:27:35.7875 [PID=2377505]    [send] send_old_work() no feasible result older than 336.0 hours
2023-03-03 01:27:35.7933 [PID=2377505]    [send] send_old_work() no feasible result younger than 216.4 hours and older than 168.0 hours
2023-03-03 01:27:35.8139 [PID=2377505]    [version] Checking plan class 'GW-SSE2'
2023-03-03 01:27:35.8149 [PID=2377505]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2023-03-03 01:27:35.8149 [PID=2377505]    [version] CPU [' family 25 model 97 stepping 2 '] lacks feature ' sse2 '
2023-03-03 01:27:35.8150 [PID=2377505]    [version] no app version available: APP#59 (einstein_O3MD1) PLATFORM#7 (x86_64-pc-linux-gnu) min_version 0
2023-03-03 01:27:35.8150 [PID=2377505]    [version] no app version available: APP#59 (einstein_O3MD1) PLATFORM#1 (i686-pc-linux-gnu) min_version 0
2023-03-03 01:27:35.8173 [PID=2377505] [debug]   [HOST#13128375] MSG(high) No work sent
2023-03-03 01:27:35.8173 [PID=2377505] [debug]   [HOST#13128375] MSG(high) see scheduler log messages on https://einsteinathome.org/host/13128375/log
2023-03-03 01:27:35.8173 [PID=2377505]    Sending reply to [HOST#13128375]: 0 results, delay req 60.00
2023-03-03 01:27:35.8173 [PID=2377505]    Scheduler ran 0.108 seconds
03.03.2023 2:26:03 |  | Starting BOINC client version 7.23.0 for x86_64-pc-linux-gnu
03.03.2023 2:26:03 |  | This a development version of BOINC and may not function properly
03.03.2023 2:26:03 |  | log flags: file_xfer, sched_ops, task
03.03.2023 2:26:03 |  | Libraries: libcurl/7.87.0 OpenSSL/3.0.8 zlib/1.2.13 zstd/1.5.4 nghttp2/1.52.0
03.03.2023 2:26:03 |  | Data directory: /home/ahorek/boi/boinc/packages/generic/sea/BOINC
03.03.2023 2:26:03 |  | No usable GPUs found
03.03.2023 2:26:03 |  | libc:  version 2.37
03.03.2023 2:26:03 |  | Host name: desktop-b49ep3v.easy.local
03.03.2023 2:26:03 |  | Processor: 32 AuthenticAMD AMD Ryzen 9 7950X 16-Core Processor [Family 25 Model 97 Stepping 2]
03.03.2023 2:26:03 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi u
03.03.2023 2:26:03 |  | OS: Linux Clear Linux OS: Clear Linux OS [6.2.0-1275.native|libc 2.37]
03.03.2023 2:26:03 |  | Memory: 30.50 GB physical, 64.00 MB virtual
03.03.2023 2:26:03 |  | Disk: 457.24 GB total, 404.86 GB free
03.03.2023 2:26:03 |  | Local time is UTC +1 hours
03.03.2023 2:26:03 |  | Config: GUI RPC allowed from any host
03.03.2023 2:26:03 |  | No general preferences found - using defaults
03.03.2023 2:26:03 |  | Preferences:
03.03.2023 2:26:03 |  | -  When computer is in use
03.03.2023 2:26:03 |  | -     'In use' means mouse/keyboard input in last 3.0 minutes
03.03.2023 2:26:03 |  | -     don't use GPU
03.03.2023 2:26:03 |  | -     Use at most 100% of the CPU time
03.03.2023 2:26:03 |  | -     suspend if non-BOINC CPU load exceeds 25%
03.03.2023 2:26:03 |  | -     max memory usage: 15.25 GB
03.03.2023 2:26:03 |  | -  When computer is not in use
03.03.2023 2:26:03 |  | -     max CPUs used: 32
03.03.2023 2:26:03 |  | -     Use at most 100% of the CPU time
03.03.2023 2:26:03 |  | -     suspend if non-BOINC CPU load exceeds 50%
03.03.2023 2:26:03 |  | -     max memory usage: 27.45 GB
03.03.2023 2:26:03 |  | -     Suspend if no input in last 60.000000 minutes
03.03.2023 2:26:03 |  | -  Suspend if running on batteries
03.03.2023 2:26:03 |  | -  Store at least 0.10 days of work
03.03.2023 2:26:03 |  | -  Store up to an additional 0.50 days of work
03.03.2023 2:26:03 |  | -  max disk usage: 404.76 GB
03.03.2023 2:26:03 |  | -  (to change preferences, visit a project web site or select Preferences in the Manager)
03.03.2023 2:26:03 |  | Setting up project and slot directories
03.03.2023 2:26:03 |  | Checking active tasks
03.03.2023 2:26:03 |  | Setting up GUI RPC socket
03.03.2023 2:26:03 |  | Checking presence of 0 project files
03.03.2023 2:26:03 |  | This computer is not attached to any projects
03.03.2023 2:26:09 |  | Fetching configuration file from https://einstein.phys.uwm.edu/get_project_config.php
03.03.2023 2:26:23 | Einstein@Home | Fetching scheduler list
03.03.2023 2:26:25 | Einstein@Home | Master file download succeeded
03.03.2023 2:26:30 | Einstein@Home | Sending scheduler request: Project initialization.
03.03.2023 2:26:30 | Einstein@Home | Requesting new tasks for CPU
03.03.2023 2:26:34 | Einstein@Home | Scheduler request completed: got 0 new tasks
...
03.03.2023 2:26:36 | Einstein@Home | Started download of einstein_icon.png
03.03.2023 2:26:36 | Einstein@Home | Started download of Android.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Arecibo_full.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Arecibo_platform.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Fermi_grsky.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Fermi_satellite.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of GW_BBH1.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of GW_BBH2.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_Hanford.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_Livingston.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_laser.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_optics.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_schematic.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_seisisol.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of LIGO_vacuum.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Parkes_full.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_J2007.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_Manhattan.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_crab_opt.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_crab_xr.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_schem1.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_schem2.jpg
03.03.2023 2:26:36 | Einstein@Home | Started download of Pulsars_vela.jpg
03.03.2023 2:26:37 | Einstein@Home | Finished download of einstein_icon.png (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Android.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Arecibo_full.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Arecibo_platform.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Fermi_grsky.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Fermi_satellite.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of GW_BBH1.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of GW_BBH2.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_Hanford.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_Livingston.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_laser.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_optics.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_schematic.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_seisisol.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of LIGO_vacuum.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Parkes_full.jpg (0 bytes)
03.03.2023 2:26:37 | Einstein@Home | Finished download of Pulsars_J2007.jpg (0 bytes)
03.03.2023 2:26:38 | Einstein@Home | Finished download of Pulsars_Manhattan.jpg (0 bytes)
03.03.2023 2:26:38 | Einstein@Home | Finished download of Pulsars_crab_opt.jpg (0 bytes)
03.03.2023 2:26:38 | Einstein@Home | Finished download of Pulsars_crab_xr.jpg (0 bytes)
03.03.2023 2:26:38 | Einstein@Home | Finished download of Pulsars_schem1.jpg (0 bytes)
03.03.2023 2:26:38 | Einstein@Home | Finished download of Pulsars_schem2.jpg (0 bytes)
03.03.2023 2:26:38 | Einstein@Home | Finished download of Pulsars_vela.jpg (0 bytes)
03.03.2023 2:27:34 | Einstein@Home | Sending scheduler request: To fetch work.
03.03.2023 2:27:34 | Einstein@Home | Requesting new tasks for CPU
03.03.2023 2:27:36 | Einstein@Home | Scheduler request completed: got 0 new tasks
03.03.2023 2:27:36 | Einstein@Home | No work sent
03.03.2023 2:27:36 | Einstein@Home | see scheduler log messages on https://einsteinathome.org/host/13128375/log
davidpanderson commented 1 year ago

It will work only when Einstein@home makes this change in their server code.

bema-aei commented 1 year ago

In the current version of the code (master branch), p_features size of the host record is still fixed 1024. That doesn't look right (and probably won't help):

    char p_features[1024];

https://github.com/BOINC/boinc/blob/master/db/boinc_db_types.h#L369

AenBleidd commented 1 year ago

@davidpanderson, I believe this is a valid point. Could you please fix that as well? Thank you in advance