Closed wvm4 closed 5 months ago
Just found out that for some reason turning SMT off has an effect on this. Can't change SMT settings of my laptop in BIOS, so I use echo off | sudo tee /sys/devices/system/cpu/smt/control
to turn it off manually after reboot usually.
GPU wasn't folding after reboot and restart of the service for some reason, but worked once I turned SMT off and restarted the service. Didn't check the logs before restarting the service though, so I'm not sure if it's even the same bug.
If the laptop goes to sleep it could enter either a standby or even hibernation state. In which case, it's likely the CPU SMT setting is reset and since the laptop is reloading it's saved state it will not run your boot scripts again.
I'm not sure why SMT would prevent the core from running correctly.
The connection errors are probably unrelated. I see Folding@home Core Shutdown: INTERRUPTED
many times in your log. This usually means you've paused the client. It also looks like the CUDA drivers are no longer working after the laptop reawakens. This is a know issue. It's a problem with the GPU drivers not the fah-client.
See https://github.com/FoldingAtHome/fah-issues/issues/1720
I'm closing this because the problem is out of our control.
Computer hibernation and sleep has been lottery on the modern PC. Be it on Windows or Linux. There seems to be massive disconnect between OS sleep mechanism and various manufacturer drivers. It is a common thing for awoken systems to even BSOD, because of the power state switch or driver reset. Nvidia on Linux is shaky at best. As Joe mentioned, log indicates that fahclient cannot initialise CUDA, nor OpenCL, which is clear indication that awakening was not clean on GPU driver level
Closing laptop lid causes GPU to get stuck in a loop of downloading and uploading work units.
Laptop was folding using both CPU and GPU while on battery power, properly paused when power was disconnected. Lid was shut and laptop was used on battery power for a bit. When opened and reconnected to AC power gets stuck in loop only for the GPU.
Loop continues in log, sometimes BAD_WORK_UNIT warnings, sometimes failed to connect errors: