FoldingAtHome / fah-client-bastet

Folding@home client, code named Bastet
GNU General Public License v3.0
73 stars 12 forks source link

GPU driver fails after laptop sleep/hibernation causing WU upload/download loop #246

Closed wvm4 closed 5 months ago

wvm4 commented 5 months ago

Closing laptop lid causes GPU to get stuck in a loop of downloading and uploading work units.

Laptop was folding using both CPU and GPU while on battery power, properly paused when power was disconnected. Lid was shut and laptop was used on battery power for a bit. When opened and reconnected to AC power gets stuck in loop only for the GPU.

16:13:51:I1:WU884:********************************************************************************
16:13:51:I1:WU884:Project: 18419 (Run 37, Clone 0, Gen 262)
16:13:51:I1:WU884:Unit: 0x00000000000000000000000000000000
16:13:51:I1:WU884:Digital signatures verified
16:13:51:I1:WU884:Calling: mdrun -c frame262.gro -s frame262.tpr -x frame262.xtc -cpi state.cpt -cpt 5 -nt 7 -ntmpi 1
16:13:51:I1:WU884:Steps: first=-1674967296 total=-1664967296
16:13:51:I1:OUT1:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
16:13:52:I1:WU883:Attempting to create CUDA context:
16:13:52:I1:WU883: Configuring platform CUDA
16:13:52:I1:OUT1:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
16:13:52:I1:Logging into node account
16:13:52:E :Exception: Failed to prevent sleep: Permission denied
16:13:55:I1:WU883: Using CUDA and gpu 0
16:13:55:I1:WU883:Completed 0 out of 5000000 steps (0%)
16:13:55:I1:WU883:Checkpoint completed at step 0
16:13:55:I1:WU884:Completed 116231 out of 10000000 steps (1%)
16:14:32:I1:WU883:Caught signal SIGINT(2) on PID 4326
16:14:32:I1:WU883:Exiting, please wait. . .
16:14:32:I1:WU883:Folding@home Core Shutdown: INTERRUPTED
16:14:32:I1:WU884:Caught signal SIGINT(2) on PID 4327
16:14:32:I1:WU884:Exiting, please wait. . .
16:14:32:I1:WU884:Folding@home Core Shutdown: INTERRUPTED
16:14:33:I1:WU884:Core returned INTERRUPTED (102)
16:14:33:I1:WU883:Core returned INTERRUPTED (102)
17:55:44:I1:Account websocket closed: PROTOCOL msg=Failed to read header start
17:55:49:I1:OUT1:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
17:55:50:I1:OUT1:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
17:55:50:I1:Logging into node account
19:54:41:I1:Account websocket closed: PROTOCOL msg=Failed to read header start
19:54:46:I1:OUT1:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
19:54:47:I1:OUT1:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
19:54:47:I1:Logging into node account
19:54:54:E :Exception: Failed to prevent sleep: Permission denied
19:54:54:I3:WU883:Running FahCore: /var/lib/fah-client/cores/openmm-core-22/fahcore-22-linux-64bit-release-0.0.20/FahCore_22 -dir 2EcDr8fx9EeBlY5iWUIaT9hBuyYHj0TKLsSrzEvwtyY -suffix 01 -version 8.3.16 -lifeline 4320 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-platform 0 -cuda-device 0 -gpu 0
19:54:54:I3:WU883:Started FahCore on PID 10565
19:54:54:I3:WU884:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01 -version 8.3.16 -lifeline 4320 -np 7
19:54:54:I3:WU884:Started FahCore on PID 10566
19:54:54:I1:WU884:*********************** Log Started 2024-05-26T19:54:54Z ***********************
19:54:54:I1:WU884:************************** Gromacs Folding@home Core ***************************
19:54:54:I1:WU884: Core: Gromacs
19:54:54:I1:WU884: Type: 0xa8
19:54:54:I1:WU884: Version: 0.0.12
19:54:54:I1:WU884: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:54:54:I1:WU884: Copyright: 2020 foldingathome.org
19:54:54:I1:WU884: Homepage: https://foldingathome.org/
19:54:54:I1:WU884: Date: Jan 16 2021
19:54:54:I1:WU884: Time: 19:24:44
19:54:54:I1:WU884: Compiler: GNU 8.3.0
19:54:54:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:54:54:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:54:54:I1:WU884: Platform: linux2 4.15.0-128-generic
19:54:54:I1:WU884: Bits: 64
19:54:54:I1:WU884: Mode: Release
19:54:54:I1:WU884: SIMD: avx2_256
19:54:54:I1:WU884: OpenMP: ON
19:54:54:I1:WU884: CUDA: OFF
19:54:54:I1:WU884: Args: -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01
19:54:54:I1:WU884: -version 8.3.16 -lifeline 4320 -np 7
19:54:54:I1:WU884:************************************ libFAH ************************************
19:54:54:I1:WU884: Date: Jan 16 2021
19:54:54:I1:WU884: Time: 19:21:38
19:54:54:I1:WU884: Compiler: GNU 8.3.0
19:54:54:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:54:54:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:54:54:I1:WU884: Platform: linux2 4.15.0-128-generic
19:54:54:I1:WU884: Bits: 64
19:54:54:I1:WU884: Mode: Release
19:54:54:I1:WU884:************************************ CBang *************************************
19:54:54:I1:WU884: Date: Jan 16 2021
19:54:54:I1:WU884: Time: 19:21:24
19:54:54:I1:WU884: Compiler: GNU 8.3.0
19:54:54:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:54:54:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:54:54:I1:WU884: Platform: linux2 4.15.0-128-generic
19:54:54:I1:WU884: Bits: 64
19:54:54:I1:WU884: Mode: Release
19:54:54:I1:WU884:************************************ System ************************************
19:54:54:I1:WU884: CPU: AMD Ryzen 7 7840HS with Radeon 780M Graphics
19:54:54:I1:WU884: CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
19:54:54:I1:WU884: CPUs: 8
19:54:54:I1:WU884: Memory: 27.18GiB
19:54:54:I1:WU884:Free Memory: 21.39GiB
19:54:54:I1:WU884: Threads: POSIX_THREADS
19:54:54:I1:WU884: OS Version: 6.5
19:54:54:I1:WU884:Has Battery: true
19:54:54:I1:WU884: On Battery: false
19:54:54:I1:WU884: UTC Offset: 2
19:54:54:I1:WU884: PID: 10566
19:54:54:I1:WU884: CWD: /var/lib/fah-client/work
19:54:54:I1:WU884:********************************************************************************
19:54:54:I1:WU884:Project: 18419 (Run 37, Clone 0, Gen 262)
19:54:54:I1:WU884:Unit: 0x00000000000000000000000000000000
19:54:54:I1:WU884:Digital signatures verified
19:54:54:I1:WU884:Calling: mdrun -c frame262.gro -s frame262.tpr -x frame262.xtc -cpi state.cpt -cpt 5 -nt 7 -ntmpi 1
19:54:54:I1:WU884:Steps: first=-1674967296 total=-1664967296
19:54:54:I1:WU883:*********************** Log Started 2024-05-26T19:54:54Z ***********************
19:54:54:I1:WU883:*************************** Core22 Folding@home Core ***************************
19:54:54:I1:WU883: Core: Core22
19:54:54:I1:WU883: Type: 0x22
19:54:54:I1:WU883: Version: 0.0.20
19:54:54:I1:WU883: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:54:54:I1:WU883: Copyright: 2020 foldingathome.org
19:54:54:I1:WU883: Homepage: https://foldingathome.org/
19:54:54:I1:WU883: Date: Jan 20 2022
19:54:54:I1:WU883: Time: 00:57:52
19:54:54:I1:WU883: Revision: 3f211b8a4346514edbff34e3cb1c0e0ec951373c
19:54:54:I1:WU883: Branch: HEAD
19:54:54:I1:WU883: Compiler: GNU 9.4.0
19:54:54:I1:WU883: Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
19:54:54:I1:WU883: -fdata-sections -O3 -funroll-loops -fno-pie
19:54:54:I1:WU883: -DOPENMM_VERSION="\"7.7.0\""
19:54:54:I1:WU883: Platform: linux 5.11.0-1025-azure
19:54:54:I1:WU883: Bits: 64
19:54:54:I1:WU883: Mode: Release
19:54:54:I1:WU883:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:54:54:I1:WU883: <peastman@stanford.edu>
19:54:54:I1:WU883: Args: -dir 2EcDr8fx9EeBlY5iWUIaT9hBuyYHj0TKLsSrzEvwtyY -suffix 01
19:54:54:I1:WU883: -version 8.3.16 -lifeline 4320 -gpu-vendor nvidia -opencl-platform
19:54:54:I1:WU883: 0 -opencl-device 0 -cuda-platform 0 -cuda-device 0 -gpu 0
19:54:54:I1:WU883:************************************ libFAH ************************************
19:54:54:I1:WU883: Date: Jan 20 2022
19:54:54:I1:WU883: Time: 00:57:22
19:54:54:I1:WU883: Revision: 9f4ad694e75c2350d4bb6b8b5b769ba27e483a2f
19:54:54:I1:WU883: Branch: HEAD
19:54:54:I1:WU883: Compiler: GNU 9.4.0
19:54:54:I1:WU883: Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
19:54:54:I1:WU883: -fdata-sections -O3 -funroll-loops -fno-pie
19:54:54:I1:WU883: Platform: linux 5.11.0-1025-azure
19:54:54:I1:WU883: Bits: 64
19:54:54:I1:WU883: Mode: Release
19:54:54:I1:WU883:************************************ CBang *************************************
19:54:54:I1:WU883: Date: Jan 20 2022
19:54:54:I1:WU883: Time: 00:57:00
19:54:54:I1:WU883: Revision: ab023d155b446906d55b0f6c9a1eedeea04f7a1a
19:54:54:I1:WU883: Branch: HEAD
19:54:54:I1:WU883: Compiler: GNU 9.4.0
19:54:54:I1:WU883: Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
19:54:54:I1:WU883: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:54:54:I1:WU883: Platform: linux 5.11.0-1025-azure
19:54:54:I1:WU883: Bits: 64
19:54:54:I1:WU883: Mode: Release
19:54:54:I1:WU883:************************************ System ************************************
19:54:54:I1:WU883: CPU: AMD Ryzen 7 7840HS with Radeon 780M Graphics
19:54:54:I1:WU883: CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
19:54:54:I1:WU883: CPUs: 8
19:54:54:I1:WU883: Memory: 27.18GiB
19:54:54:I1:WU883:Free Memory: 21.39GiB
19:54:54:I1:WU883: Threads: POSIX_THREADS
19:54:54:I1:WU883: OS Version: 6.5
19:54:54:I1:WU883:Has Battery: true
19:54:54:I1:WU883: On Battery: false
19:54:54:I1:WU883: UTC Offset: 2
19:54:54:I1:WU883: PID: 10565
19:54:54:I1:WU883: CWD: /var/lib/fah-client/work
19:54:54:I1:WU883:************************************ OpenMM ************************************
19:54:54:I1:WU883: Version: 7.7.0
19:54:54:I1:WU883:********************************************************************************
19:54:54:I1:WU883:Project: 12121 (Run 108, Clone 3, Gen 1)
19:54:54:I1:WU883:Digital signatures verified
19:54:54:I1:WU883:Folding@home GPU Core22 Folding@home Core
19:54:54:I1:WU883:Version 0.0.20
19:54:54:I1:WU883: Checkpoint write interval: 250000 steps (5%) [20 total]
19:54:54:I1:WU883: JSON viewer frame write interval: 50000 steps (1%) [100 total]
19:54:54:I1:WU883: XTC frame write interval: 25000 steps (0.5%) [200 total]
19:54:54:I1:WU883: Global context and integrator variables write interval: disabled
19:54:54:I1:WU883:There are 4 platforms available.
19:54:54:I1:WU883:Platform 0: Reference
19:54:54:I1:WU883:Platform 1: CPU
19:54:54:I1:WU883:Platform 2: OpenCL
19:54:54:I1:WU883: opencl-device 0 specified
19:54:54:I1:WU883:Platform 3: CUDA
19:54:54:I1:WU883: cuda-device 0 specified
19:54:55:I1:WU883:Attempting to create CUDA context:
19:54:55:I1:WU883: Configuring platform CUDA
19:54:55:I1:WU883:Failed to create CUDA context:
19:54:55:I1:WU883:The requested CUDA device could not be loaded
19:54:55:I1:WU883:Attempting to create OpenCL context:
19:54:55:I1:WU883: Configuring platform OpenCL
19:54:55:I1:WU883:Failed to create OpenCL context:
19:54:55:I1:WU883:Error initializing context: clCreateContext (-5)
19:54:55:I1:WU883:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
19:54:55:I1:WU883:Saving result file ../logfile_01.txt
19:54:55:I1:WU883:Saving result file science.log
19:54:55:I1:WU883:Folding@home Core Shutdown: BAD_WORK_UNIT
19:54:55:W :WU883:Core returned BAD_WORK_UNIT (114)
19:54:55:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
19:54:55:I1:WU883:Uploading WU results
19:54:55:I1:WU885:Requesting WU assignment for user Willemvmaarsch team 0
19:54:55:I1:OUT6:> POST https://assign1.foldingathome.org/api/assign HTTP/1.1
19:54:55:I1:WU884:Caught signal SIGINT(2) on PID 10566
19:54:55:I1:WU884:Exiting, please wait. . .
19:54:56:I1:OUT6:< HTTP/1.1 200 HTTP_OK
19:54:56:I1:WU885:Received WU assignment xqQFl8_q7Yzr_tZNGtVd7tTYRs3UdFJ64H-EoqSYnOw
19:54:56:I1:WU885:Downloading WU
19:54:58:I1:OUT7:> POST https://fah1.innovatr.ca/api/assign HTTP/1.1
19:54:59:I1:WU884:Completed 120098 out of 10000000 steps (1%)
19:54:59:I1:WU884:Folding@home Core Shutdown: INTERRUPTED
19:54:59:I1:WU884:Core returned INTERRUPTED (102)
19:54:59:I3:WU884:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01 -version 8.3.16 -lifeline 4320 -np 7
19:54:59:I3:WU884:Started FahCore on PID 10608
19:54:59:I1:WU884:*********************** Log Started 2024-05-26T19:54:59Z ***********************
19:54:59:I1:WU884:************************** Gromacs Folding@home Core ***************************
19:54:59:I1:WU884: Core: Gromacs
19:54:59:I1:WU884: Type: 0xa8
19:54:59:I1:WU884: Version: 0.0.12
19:54:59:I1:WU884: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:54:59:I1:WU884: Copyright: 2020 foldingathome.org
19:54:59:I1:WU884: Homepage: https://foldingathome.org/
19:54:59:I1:WU884: Date: Jan 16 2021
19:54:59:I1:WU884: Time: 19:24:44
19:54:59:I1:WU884: Compiler: GNU 8.3.0
19:54:59:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:54:59:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:54:59:I1:WU884: Platform: linux2 4.15.0-128-generic
19:54:59:I1:WU884: Bits: 64
19:54:59:I1:WU884: Mode: Release
19:54:59:I1:WU884: SIMD: avx2_256
19:54:59:I1:WU884: OpenMP: ON
19:54:59:I1:WU884: CUDA: OFF
19:54:59:I1:WU884: Args: -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01
19:54:59:I1:WU884: -version 8.3.16 -lifeline 4320 -np 7
19:54:59:I1:WU884:************************************ libFAH ************************************
19:54:59:I1:WU884: Date: Jan 16 2021
19:54:59:I1:WU884: Time: 19:21:38
19:54:59:I1:WU884: Compiler: GNU 8.3.0
19:54:59:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:54:59:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:54:59:I1:WU884: Platform: linux2 4.15.0-128-generic
19:54:59:I1:WU884: Bits: 64
19:54:59:I1:WU884: Mode: Release
19:54:59:I1:WU884:************************************ CBang *************************************
19:54:59:I1:WU884: Date: Jan 16 2021
19:54:59:I1:WU884: Time: 19:21:24
19:54:59:I1:WU884: Compiler: GNU 8.3.0
19:54:59:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:54:59:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:54:59:I1:WU884: Platform: linux2 4.15.0-128-generic
19:54:59:I1:WU884: Bits: 64
19:54:59:I1:WU884: Mode: Release
19:54:59:I1:WU884:************************************ System ************************************
19:54:59:I1:WU884: CPU: AMD Ryzen 7 7840HS with Radeon 780M Graphics
19:54:59:I1:WU884: CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
19:54:59:I1:WU884: CPUs: 8
19:54:59:I1:WU884: Memory: 27.18GiB
19:54:59:I1:WU884:Free Memory: 21.47GiB
19:54:59:I1:WU884: Threads: POSIX_THREADS
19:54:59:I1:WU884: OS Version: 6.5
19:54:59:I1:WU884:Has Battery: true
19:54:59:I1:WU884: On Battery: false
19:54:59:I1:WU884: UTC Offset: 2
19:54:59:I1:WU884: PID: 10608
19:54:59:I1:WU884: CWD: /var/lib/fah-client/work
19:54:59:I1:WU884:********************************************************************************
19:54:59:I1:WU884:Project: 18419 (Run 37, Clone 0, Gen 262)
19:54:59:I1:WU884:Unit: 0x00000000000000000000000000000000
19:54:59:I1:WU884:Digital signatures verified
19:54:59:I1:WU884:Calling: mdrun -c frame262.gro -s frame262.tpr -x frame262.xtc -cpi state.cpt -cpt 5 -nt 7 -ntmpi 1
19:54:59:I1:WU884:Steps: first=-1674967296 total=-1664967296
19:55:01:I1:WU885:DOWNLOAD 9% 2.97MiB of 33.38MiB
19:55:02:I1:WU885:DOWNLOAD 29% 9.58MiB of 33.38MiB
19:55:03:I1:WU885:DOWNLOAD 54% 18.14MiB of 33.38MiB
19:55:03:I1:WU884:Completed 120100 out of 10000000 steps (1%)
19:55:04:I1:WU885:DOWNLOAD 78% 25.96MiB of 33.38MiB
19:55:05:I1:WU885:DOWNLOAD 100% 33.38MiB of 33.38MiB
19:55:05:I1:OUT7:< HTTP/1.1 200 HTTP_OK
19:55:05:I1:WU885:Received WU
19:55:05:I1:Loaded cores/openmm-core-23/centos-7.9.2009-64bit/release/fahcore-23-centos-7.9.2009-64bit-release-8.0.3/FahCore_23
19:55:05:I3:WU885:Running FahCore: /var/lib/fah-client/cores/openmm-core-23/centos-7.9.2009-64bit/release/fahcore-23-centos-7.9.2009-64bit-release-8.0.3/FahCore_23 -dir xqQFl8_q7Yzr_tZNGtVd7tTYRs3UdFJ64H-EoqSYnOw -suffix 01 -version 8.3.16 -lifeline 4320 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-platform 0 -cuda-device 0 -gpu 0
19:55:05:I3:WU885:Started FahCore on PID 10618
19:55:05:I1:WU885:*********************** Log Started 2024-05-26T19:55:05Z ***********************
19:55:05:I1:WU885:*************************** Core23 Folding@home Core ***************************
19:55:05:I1:WU885: Core: Core23
19:55:05:I1:WU885: Type: 0x23
19:55:05:I1:WU885: Version: 8.0.3
19:55:05:I1:WU885: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:55:05:I1:WU885: Copyright: 2022 foldingathome.org
19:55:05:I1:WU885: Homepage: https://foldingathome.org/
19:55:05:I1:WU885: Date: Aug 3 2023
19:55:05:I1:WU885: Time: 08:28:22
19:55:05:I1:WU885: Revision: 199cb870317d05441d0a301287d9ef61254fa32b
19:55:05:I1:WU885: Branch: HEAD
19:55:05:I1:WU885: Compiler: GNU 7.5.0
19:55:05:I1:WU885: Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
19:55:05:I1:WU885: -fdata-sections -O3 -funroll-loops -fno-pie
19:55:05:I1:WU885: -DOPENMM_VERSION="\"8.0.0\""
19:55:05:I1:WU885: Platform: linux 5.15.0-1041-azure
19:55:05:I1:WU885: Bits: 64
19:55:05:I1:WU885: Mode: Release
19:55:05:I1:WU885:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:55:05:I1:WU885: <peastman@stanford.edu>
19:55:05:I1:WU885: Args: -dir xqQFl8_q7Yzr_tZNGtVd7tTYRs3UdFJ64H-EoqSYnOw -suffix 01
19:55:05:I1:WU885: -version 8.3.16 -lifeline 4320 -gpu-vendor nvidia -opencl-platform
19:55:05:I1:WU885: 0 -opencl-device 0 -cuda-platform 0 -cuda-device 0 -gpu 0
19:55:05:I1:WU885:************************************ libFAH ************************************
19:55:05:I1:WU885: Date: Aug 3 2023
19:55:05:I1:WU885: Time: 08:27:48
19:55:05:I1:WU885: Revision: 112c2234abe20611a05652defc3c7f854cbf927f
19:55:05:I1:WU885: Branch: HEAD
19:55:05:I1:WU885: Compiler: GNU 7.5.0
19:55:05:I1:WU885: Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
19:55:05:I1:WU885: -fdata-sections -O3 -funroll-loops -fno-pie
19:55:05:I1:WU885: Platform: linux 5.15.0-1041-azure
19:55:05:I1:WU885: Bits: 64
19:55:05:I1:WU885: Mode: Release
19:55:05:I1:WU885:************************************ CBang *************************************
19:55:05:I1:WU885: Version: 1.7.2
19:55:05:I1:WU885: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:55:05:I1:WU885: Org: Cauldron Development LLC
19:55:05:I1:WU885: Copyright: Cauldron Development LLC, 2003-2023
19:55:05:I1:WU885: Homepage: https://cauldrondevelopment.com/
19:55:05:I1:WU885: License: GPL 2+
19:55:05:I1:WU885: Date: Aug 3 2023
19:55:05:I1:WU885: Time: 08:27:30
19:55:05:I1:WU885: Revision: eae4b58965bdd4d54ea9eb77972674352b37a547
19:55:05:I1:WU885: Branch: HEAD
19:55:05:I1:WU885: Compiler: GNU 7.5.0
19:55:05:I1:WU885: Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
19:55:05:I1:WU885: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:55:05:I1:WU885: Platform: linux 5.15.0-1041-azure
19:55:05:I1:WU885: Bits: 64
19:55:05:I1:WU885: Mode: Release
19:55:05:I1:WU885:************************************ System ************************************
19:55:05:I1:WU885: CPU: AMD Ryzen 7 7840HS with Radeon 780M Graphics
19:55:05:I1:WU885: CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
19:55:05:I1:WU885: CPUs: 8
19:55:05:I1:WU885: Memory: 27.18GiB
19:55:05:I1:WU885:Free Memory: 21.36GiB
19:55:05:I1:WU885: Threads: POSIX_THREADS
19:55:05:I1:WU885: OS Version: 6.5
19:55:05:I1:WU885:Has Battery: true
19:55:05:I1:WU885: On Battery: false
19:55:05:I1:WU885: UTC Offset: 2
19:55:05:I1:WU885: PID: 10618
19:55:05:I1:WU885: CWD: /var/lib/fah-client/work
19:55:05:I1:WU885: Exec: /var/lib/fah-client/cores/openmm-core-23/centos-7.9.2009-64bit/release/fahcore-23-centos-7.9.2009-64bit-release-8.0.3/FahCore_23
19:55:05:I1:WU885:************************************ OpenMM ************************************
19:55:05:I1:WU885: Version: 8.0.0
19:55:05:I1:WU885:********************************************************************************
19:55:05:I1:WU885:Project: 12299 (Run 18, Clone 20, Gen 72)
19:55:05:I1:WU885:Reading tar file core.xml
19:55:05:I1:WU885:Reading tar file integrator.xml
19:55:05:I1:WU885:Reading tar file state.xml.bz2
19:55:05:I1:WU885:Reading tar file system.xml.bz2
19:55:05:I1:WU885:Digital signatures verified
19:55:05:I1:WU885:Folding@home GPU Core23 Folding@home Core
19:55:05:I1:WU885:Version 8.0.3
19:55:05:I1:WU885: Checkpoint write interval: 25000 steps (2%) [50 total]
19:55:05:I1:WU885: JSON viewer frame write interval: 12500 steps (1%) [100 total]
19:55:05:I1:WU885: XTC frame write interval: 25000 steps (2%) [50 total]
19:55:05:I1:WU885: Global context and integrator variables write interval: disabled
19:55:06:I1:WU885:There are 3 platforms available.
19:55:06:I1:WU885:Platform 0: Reference
19:55:06:I1:WU885:Platform 1: CPU
19:55:06:I1:WU885:Platform 2: CUDA
19:55:06:I1:WU885: cuda-device 0 specified
19:55:06:I1:WU885:opencl-device was set but OpenCL platform could not be found.
19:55:18:I1:WU885:Attempting to create CUDA context:
19:55:18:I1:WU885: Configuring platform CUDA
19:55:18:I1:WU885:Failed to create CUDA context:
19:55:18:I1:WU885:Error initializing CUDA: CUDA_ERROR_UNKNOWN (999) at /home/conda/feedstock_root/build_artifacts/openmm_1682500577703/work/platforms/cuda/src/CudaContext.cpp:140
19:55:18:I1:WU885:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
19:55:18:I1:WU885:Saving result file ../logfile_01.txt
19:55:18:I1:WU885:Saving result file science.log
19:55:18:I1:WU885:Folding@home Core Shutdown: BAD_WORK_UNIT
19:55:18:W :WU885:Core returned BAD_WORK_UNIT (114)
19:55:18:I1:Default:Added new work unit: cpus:0 gpus:gpu:01:00:00
19:55:18:I1:WU886:Requesting WU assignment for user Willemvmaarsch team 0
19:55:18:I1:WU885:Uploading WU results
19:55:18:I1:OUT9:> POST https://fah1.innovatr.ca/api/results HTTP/1.1
19:55:18:I1:OUT8:> POST https://assign2.foldingathome.org/api/assign HTTP/1.1
19:55:18:I1:WU884:Caught signal SIGINT(2) on PID 10608
19:55:18:I1:WU884:Exiting, please wait. . .
19:55:18:I1:WU884:Folding@home Core Shutdown: INTERRUPTED
19:55:19:I1:OUT9:< HTTP/1.1 200 HTTP_OK
19:55:19:I1:WU885:Credited
19:55:19:I1:WU884:Core returned INTERRUPTED (102)
19:55:19:I3:WU884:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01 -version 8.3.16 -lifeline 4320 -np 8
19:55:19:I3:WU884:Started FahCore on PID 10634
19:55:19:I1:WU884:*********************** Log Started 2024-05-26T19:55:19Z ***********************
19:55:19:I1:WU884:************************** Gromacs Folding@home Core ***************************
19:55:19:I1:WU884: Core: Gromacs
19:55:19:I1:WU884: Type: 0xa8
19:55:19:I1:WU884: Version: 0.0.12
19:55:19:I1:WU884: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:55:19:I1:WU884: Copyright: 2020 foldingathome.org
19:55:19:I1:WU884: Homepage: https://foldingathome.org/
19:55:19:I1:WU884: Date: Jan 16 2021
19:55:19:I1:WU884: Time: 19:24:44
19:55:19:I1:WU884: Compiler: GNU 8.3.0
19:55:19:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:55:19:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:55:19:I1:WU884: Platform: linux2 4.15.0-128-generic
19:55:19:I1:WU884: Bits: 64
19:55:19:I1:WU884: Mode: Release
19:55:19:I1:WU884: SIMD: avx2_256
19:55:19:I1:WU884: OpenMP: ON
19:55:19:I1:WU884: CUDA: OFF
19:55:19:I1:WU884: Args: -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01
19:55:19:I1:WU884: -version 8.3.16 -lifeline 4320 -np 8
19:55:19:I1:WU884:************************************ libFAH ************************************
19:55:19:I1:WU884: Date: Jan 16 2021
19:55:19:I1:WU884: Time: 19:21:38
19:55:19:I1:WU884: Compiler: GNU 8.3.0
19:55:19:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:55:19:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:55:19:I1:WU884: Platform: linux2 4.15.0-128-generic
19:55:19:I1:WU884: Bits: 64
19:55:19:I1:WU884: Mode: Release
19:55:19:I1:WU884:************************************ CBang *************************************
19:55:19:I1:WU884: Date: Jan 16 2021
19:55:19:I1:WU884: Time: 19:21:24
19:55:19:I1:WU884: Compiler: GNU 8.3.0
19:55:19:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:55:19:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:55:19:I1:WU884: Platform: linux2 4.15.0-128-generic
19:55:19:I1:WU884: Bits: 64
19:55:19:I1:WU884: Mode: Release
19:55:19:I1:WU884:************************************ System ************************************
19:55:19:I1:WU884: CPU: AMD Ryzen 7 7840HS with Radeon 780M Graphics
19:55:19:I1:WU884: CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
19:55:19:I1:WU884: CPUs: 8
19:55:19:I1:WU884: Memory: 27.18GiB
19:55:19:I1:WU884:Free Memory: 21.44GiB
19:55:19:I1:WU884: Threads: POSIX_THREADS
19:55:19:I1:WU884: OS Version: 6.5
19:55:19:I1:WU884:Has Battery: true
19:55:19:I1:WU884: On Battery: false
19:55:19:I1:WU884: UTC Offset: 2
19:55:19:I1:WU884: PID: 10634
19:55:19:I1:WU884: CWD: /var/lib/fah-client/work
19:55:19:I1:WU884:********************************************************************************
19:55:19:I1:WU884:Project: 18419 (Run 37, Clone 0, Gen 262)
19:55:19:I1:WU884:Unit: 0x00000000000000000000000000000000
19:55:19:I1:WU884:Digital signatures verified
19:55:19:I1:WU884:Calling: mdrun -c frame262.gro -s frame262.tpr -x frame262.xtc -cpi state.cpt -cpt 5 -nt 8 -ntmpi 1
19:55:19:I1:WU884:Steps: first=-1674967296 total=-1664967296
19:55:19:I1:OUT8:< HTTP/1.1 200 HTTP_OK
19:55:19:I1:WU886:Received WU assignment dvr514W2aiW6oUxbe7mvxyDpz1bwOstGHbwTYoiBiHk
19:55:19:I1:WU886:Downloading WU
19:55:19:I1:OUT10:> POST https://fah1.innovatr.ca/api/assign HTTP/1.1
19:55:19:I1:WU884:Caught signal SIGINT(2) on PID 10634
19:55:19:I1:WU884:Exiting, please wait. . .
19:55:22:I1:WU884:Completed 122513 out of 10000000 steps (1%)
19:55:22:I1:WU886:DOWNLOAD 11% 3.64MiB of 33.39MiB
19:55:22:I1:WU884:Folding@home Core Shutdown: INTERRUPTED
19:55:22:I1:WU884:Core returned INTERRUPTED (102)
19:55:22:I3:WU884:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01 -version 8.3.16 -lifeline 4320 -np 7
19:55:22:I3:WU884:Started FahCore on PID 10649
19:55:22:I1:WU884:*********************** Log Started 2024-05-26T19:55:22Z ***********************
19:55:22:I1:WU884:************************** Gromacs Folding@home Core ***************************
19:55:22:I1:WU884: Core: Gromacs
19:55:22:I1:WU884: Type: 0xa8
19:55:22:I1:WU884: Version: 0.0.12
19:55:22:I1:WU884: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:55:22:I1:WU884: Copyright: 2020 foldingathome.org
19:55:22:I1:WU884: Homepage: https://foldingathome.org/
19:55:22:I1:WU884: Date: Jan 16 2021
19:55:22:I1:WU884: Time: 19:24:44
19:55:22:I1:WU884: Compiler: GNU 8.3.0
19:55:22:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:55:22:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:55:22:I1:WU884: Platform: linux2 4.15.0-128-generic
19:55:22:I1:WU884: Bits: 64
19:55:22:I1:WU884: Mode: Release
19:55:22:I1:WU884: SIMD: avx2_256
19:55:22:I1:WU884: OpenMP: ON
19:55:22:I1:WU884: CUDA: OFF
19:55:22:I1:WU884: Args: -dir WrKAdHjaq4N_Xg5hZWuv24mwm30LfkMJzWgn05EvfYQ -suffix 01
19:55:22:I1:WU884: -version 8.3.16 -lifeline 4320 -np 7
19:55:22:I1:WU884:************************************ libFAH ************************************
19:55:22:I1:WU884: Date: Jan 16 2021
19:55:22:I1:WU884: Time: 19:21:38
19:55:22:I1:WU884: Compiler: GNU 8.3.0
19:55:22:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:55:22:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie
19:55:22:I1:WU884: Platform: linux2 4.15.0-128-generic
19:55:22:I1:WU884: Bits: 64
19:55:22:I1:WU884: Mode: Release
19:55:22:I1:WU884:************************************ CBang *************************************
19:55:22:I1:WU884: Date: Jan 16 2021
19:55:22:I1:WU884: Time: 19:21:24
19:55:22:I1:WU884: Compiler: GNU 8.3.0
19:55:22:I1:WU884: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
19:55:22:I1:WU884: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
19:55:22:I1:WU884: Platform: linux2 4.15.0-128-generic
19:55:22:I1:WU884: Bits: 64
19:55:22:I1:WU884: Mode: Release
19:55:22:I1:WU884:************************************ System ************************************
19:55:22:I1:WU884: CPU: AMD Ryzen 7 7840HS with Radeon 780M Graphics
19:55:22:I1:WU884: CPU ID: AuthenticAMD Family 25 Model 116 Stepping 1
19:55:22:I1:WU884: CPUs: 8
19:55:22:I1:WU884: Memory: 27.18GiB
19:55:22:I1:WU884:Free Memory: 21.45GiB
19:55:22:I1:WU884: Threads: POSIX_THREADS
19:55:22:I1:WU884: OS Version: 6.5
19:55:22:I1:WU884:Has Battery: true
19:55:22:I1:WU884: On Battery: false
19:55:22:I1:WU884: UTC Offset: 2
19:55:22:I1:WU884: PID: 10649
19:55:22:I1:WU884: CWD: /var/lib/fah-client/work
19:55:22:I1:WU884:********************************************************************************
19:55:22:I1:WU884:Project: 18419 (Run 37, Clone 0, Gen 262)
19:55:22:I1:WU884:Unit: 0x00000000000000000000000000000000
19:55:22:I1:WU884:Digital signatures verified
19:55:22:I1:WU884:Calling: mdrun -c frame262.gro -s frame262.tpr -x frame262.xtc -cpi state.cpt -cpt 5 -nt 7 -ntmpi 1
19:55:22:I1:WU884:Steps: first=-1674967296 total=-1664967296
19:55:23:I1:WU886:DOWNLOAD 39% 13.16MiB of 33.39MiB
19:55:24:I1:WU886:DOWNLOAD 59% 19.56MiB of 33.39MiB
19:55:25:I1:WU886:DOWNLOAD 79% 26.35MiB of 33.39MiB
19:55:26:I1:WU886:DOWNLOAD 100% 33.38MiB of 33.39MiB
19:55:26:I1:OUT10:< HTTP/1.1 200 HTTP_OK
19:55:26:I1:WU886:Received WU
19:55:26:I3:WU886:Running FahCore: /var/lib/fah-client/cores/openmm-core-23/centos-7.9.2009-64bit/release/fahcore-23-centos-7.9.2009-64bit-release-8.0.3/FahCore_23 -dir dvr514W2aiW6oUxbe7mvxyDpz1bwOstGHbwTYoiBiHk -suffix 01 -version 8.3.16 -lifeline 4320 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-platform 0 -cuda-device 0 -gpu 0
19:55:26:I3:WU886:Started FahCore on PID 10660
19:55:26:I1:WU884:Completed 122515 out of 10000000 steps (1%)
19:55:27:I1:WU886:*********************** Log Started 2024-05-26T19:55:26Z ***********************

Loop continues in log, sometimes BAD_WORK_UNIT warnings, sometimes failed to connect errors:

19:56:48:I1:WU884:********************************************************************************
19:56:48:I1:WU884:Project: 18419 (Run 37, Clone 0, Gen 262)
19:56:48:I1:WU884:Unit: 0x00000000000000000000000000000000
19:56:48:I1:WU884:Digital signatures verified
19:56:48:I1:WU884:Calling: mdrun -c frame262.gro -s frame262.tpr -x frame262.xtc -cpi state.cpt -cpt 5 -nt 7 -ntmpi 1
19:56:48:I1:WU884:Steps: first=-1674967296 total=-1664967296
19:56:53:I1:WU884:Completed 128408 out of 10000000 steps (1%)
19:56:59:E :OUT20:Failed response: CONNECT
19:56:59:I1:WU883:Uploading WU results
19:56:59:I1:OUT36:> POST https://fah1.innovatr.ca/api/results HTTP/1.1
19:56:59:I1:OUT36:< HTTP/1.1 200 HTTP_OK
19:56:59:I1:WU883:Credited
19:57:38:E :OUT31:Failed response: CONNECT
19:57:38:I1:WU892:Uploading WU results
19:57:46:E :OUT35:Failed response: CONNECT
19:57:46:I1:WU894:Retry #1 in 2 secs
19:57:48:I1:WU894:Downloading WU
19:58:39:E :OUT37:Failed response: CONNECT
19:58:39:I1:WU892:Uploading WU results
19:58:39:I1:OUT39:> POST https://fah1.innovatr.ca/api/results HTTP/1.1
19:58:39:I1:OUT39:< HTTP/1.1 200 HTTP_OK
19:58:39:I1:WU892:Credited
19:58:49:E :OUT38:Failed response: CONNECT
19:58:49:I1:WU894:Retry #2 in 4 secs
19:58:53:I1:WU894:Downloading WU
19:59:54:E :OUT40:Failed response: CONNECT
19:59:54:I1:WU894:Retry #3 in 8 secs
20:00:02:I1:WU894:Downloading WU
20:00:02:I1:OUT41:> POST https://ds01.scs.illinois.edu/api/assign HTTP/1.1
20:00:05:I1:WU894:DOWNLOAD 57% 3.99MiB of 7.04MiB
20:00:05:I1:OUT41:< HTTP/1.1 200 HTTP_OK
20:00:05:I1:WU894:Received WU
20:00:05:I3:WU894:Running FahCore: /var/lib/fah-client/cores/openmm-core-22/fahcore-22-linux-64bit-release-0.0.20/FahCore_22 -dir sxHyotVL6Z94EQQ_XyiCD9Vb9UshKs-uA9okmsim76k -suffix 01 -version 8.3.16 -lifeline 4320 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-platform 0 -cuda-device 0 -gpu 0
20:00:05:I3:WU894:Started FahCore on PID 11097
20:00:06:I1:WU894:*********************** Log Started 2024-05-26T20:00:05Z ***********************
20:00:06:I1:WU894:*************************** Core22 Folding@home Core ***************************
wvm4 commented 5 months ago

Just found out that for some reason turning SMT off has an effect on this. Can't change SMT settings of my laptop in BIOS, so I use echo off | sudo tee /sys/devices/system/cpu/smt/control to turn it off manually after reboot usually. GPU wasn't folding after reboot and restart of the service for some reason, but worked once I turned SMT off and restarted the service. Didn't check the logs before restarting the service though, so I'm not sure if it's even the same bug.

jcoffland commented 5 months ago

If the laptop goes to sleep it could enter either a standby or even hibernation state. In which case, it's likely the CPU SMT setting is reset and since the laptop is reloading it's saved state it will not run your boot scripts again.

I'm not sure why SMT would prevent the core from running correctly.

The connection errors are probably unrelated. I see Folding@home Core Shutdown: INTERRUPTED many times in your log. This usually means you've paused the client. It also looks like the CUDA drivers are no longer working after the laptop reawakens. This is a know issue. It's a problem with the GPU drivers not the fah-client.

jcoffland commented 5 months ago

See https://github.com/FoldingAtHome/fah-issues/issues/1720

I'm closing this because the problem is out of our control.

muziqaz commented 5 months ago

Computer hibernation and sleep has been lottery on the modern PC. Be it on Windows or Linux. There seems to be massive disconnect between OS sleep mechanism and various manufacturer drivers. It is a common thing for awoken systems to even BSOD, because of the power state switch or driver reset. Nvidia on Linux is shaky at best. As Joe mentioned, log indicates that fahclient cannot initialise CUDA, nor OpenCL, which is clear indication that awakening was not clean on GPU driver level