FoldingAtHome / fah-client-bastet

Folding@home client, code named Bastet
GNU General Public License v3.0
61 stars 10 forks source link

Guru meditation error causes infinite restarts #134

Closed jon-ault closed 7 months ago

jon-ault commented 1 year ago

As the saying goes, "be careful what you wish for, you might get it".

After getting the new beta 8.1.16 with the fix to error #127, a couple times I've found my computer stuck in an infinite cycle of restarting a CPU work unit when it aborts due to a guru meditation error. It restarts, immediately fails, restarts again, immediately fails again, and so on. The most recent iteration started sometime Thursday evening, I didn't catch it until early Sunday morning after it generated over 125MB of log files.

There should probably be an upper limit on the number of times it restarts a work unit after which it pulls the rip cord 🪂.

A snippet of one of the log files showing the error & restart. ``` *********************** Log Started 2023-03-17T00:00:00Z *********************** 00:00:00:I1::WU1098:*********************** Log Started 2023-03-16T23:59:59Z *********************** 00:00:00:I1::WU1098:************************** Gromacs Folding@home Core *************************** 00:00:00:I1::WU1098: Core: Gromacs 00:00:00:I1::WU1098: Type: 0xa8 00:00:00:I1::WU1098: Version: 0.0.12 00:00:00:I1::WU1098: Author: Joseph Coffland 00:00:00:I1::WU1098: Copyright: 2020 foldingathome.org 00:00:00:I1::WU1098: Homepage: https://foldingathome.org/ 00:00:00:I1::WU1098: Date: Jan 16 2021 00:00:00:I1::WU1098: Time: 12:29:40 00:00:00:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:00:I1::WU1098: Branch: master 00:00:00:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:00:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:00:I1::WU1098: Platform: win32 10 00:00:00:I1::WU1098: Bits: 64 00:00:00:I1::WU1098: Mode: Release 00:00:00:I1::WU1098: SIMD: avx2_256 00:00:00:I1::WU1098: OpenMP: ON 00:00:00:I1::WU1098: CUDA: OFF 00:00:00:I1::WU1098: Args: -dir sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904 -suffix 01 00:00:00:I1::WU1098: -version 8.1.16 -lifeline 17448 -np 6 00:00:00:I1::WU1098:************************************ libFAH ************************************ 00:00:00:I1::WU1098: Date: Jan 16 2021 00:00:00:I1::WU1098: Time: 11:24:13 00:00:00:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:00:I1::WU1098: Branch: master 00:00:00:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:00:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:00:I1::WU1098: Platform: win32 10 00:00:00:I1::WU1098: Bits: 64 00:00:00:I1::WU1098: Mode: Release 00:00:00:I1::WU1098:************************************ CBang ************************************* 00:00:00:I1::WU1098: Date: Jan 16 2021 00:00:00:I1::WU1098: Time: 11:23:53 00:00:00:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:00:I1::WU1098: Branch: master 00:00:00:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:00:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:00:I1::WU1098: Platform: win32 10 00:00:00:I1::WU1098: Bits: 64 00:00:00:I1::WU1098: Mode: Release 00:00:00:I1::WU1098:************************************ System ************************************ 00:00:00:I1::WU1098: CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz 00:00:00:I1::WU1098: CPU ID: GenuineIntel Family 6 Model 158 Stepping 12 00:00:00:I1::WU1098: CPUs: 8 00:00:00:I1::WU1098: Memory: 15.91GiB 00:00:00:I1::WU1098:Free Memory: 10.92GiB 00:00:00:I1::WU1098: Threads: WINDOWS_THREADS 00:00:00:I1::WU1098: OS Version: 6.2 00:00:00:I1::WU1098:Has Battery: true 00:00:00:I1::WU1098: On Battery: false 00:00:00:I1::WU1098: UTC Offset: -5 00:00:00:I1::WU1098: PID: 5188 00:00:00:I1::WU1098: CWD: C:\ProgramData\FAHClient\work 00:00:00:I1::WU1098:******************************************************************************** 00:00:00:I1::WU1098:Project: 18480 (Run 162, Clone 0, Gen 5) 00:00:00:I1::WU1098:Unit: 0x00000000000000000000000000000000 00:00:00:I1::WU1098:Digital signatures verified 00:00:00:I1::WU1098:Calling: mdrun -c frame5.gro -s frame5.tpr -x frame5.xtc -cpi state.cpt -cpt 5 -nt 6 -ntmpi 1 00:00:00:I1::WU1098:ERROR:Guru Meditation #abdb0dab117a0421.6809e67b93b2de47 (2866074.2866401) 'sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904/01/dhdl.xvg' 00:00:00:W ::WU1098:Core returned UNKNOWN_ENUM (3221226505) 00:00:00:W ::WU1098:Core exited with Windows unhandled exception code 0xc0000409. See https://bit.ly/2CXgWkZ for more information. 00:00:00:I1::WU1098:Retry #1 in 2 secs 00:00:02:I3:Removing old file 'work/sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904/logfile_01-20230316-235616.txt' 00:00:02:I3::WU1098:Running FahCore: C:\ProgramData\FAHClient\cores/fahcore-a8-win-64bit-avx2_256-0.0.12/FahCore_a8.exe -dir sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904 -suffix 01 -version 8.1.16 -lifeline 17448 -np 6 00:00:02:I3::WU1098:Started FahCore on PID 14296 00:00:03:I1::WU1098:*********************** Log Started 2023-03-17T00:00:02Z *********************** 00:00:03:I1::WU1098:************************** Gromacs Folding@home Core *************************** 00:00:03:I1::WU1098: Core: Gromacs 00:00:03:I1::WU1098: Type: 0xa8 00:00:03:I1::WU1098: Version: 0.0.12 00:00:03:I1::WU1098: Author: Joseph Coffland 00:00:03:I1::WU1098: Copyright: 2020 foldingathome.org 00:00:03:I1::WU1098: Homepage: https://foldingathome.org/ 00:00:03:I1::WU1098: Date: Jan 16 2021 00:00:03:I1::WU1098: Time: 12:29:40 00:00:03:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:03:I1::WU1098: Branch: master 00:00:03:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:03:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:03:I1::WU1098: Platform: win32 10 00:00:03:I1::WU1098: Bits: 64 00:00:03:I1::WU1098: Mode: Release 00:00:03:I1::WU1098: SIMD: avx2_256 00:00:03:I1::WU1098: OpenMP: ON 00:00:03:I1::WU1098: CUDA: OFF 00:00:03:I1::WU1098: Args: -dir sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904 -suffix 01 00:00:03:I1::WU1098: -version 8.1.16 -lifeline 17448 -np 6 00:00:03:I1::WU1098:************************************ libFAH ************************************ 00:00:03:I1::WU1098: Date: Jan 16 2021 00:00:03:I1::WU1098: Time: 11:24:13 00:00:03:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:03:I1::WU1098: Branch: master 00:00:03:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:03:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:03:I1::WU1098: Platform: win32 10 00:00:03:I1::WU1098: Bits: 64 00:00:03:I1::WU1098: Mode: Release 00:00:03:I1::WU1098:************************************ CBang ************************************* 00:00:03:I1::WU1098: Date: Jan 16 2021 00:00:03:I1::WU1098: Time: 11:23:53 00:00:03:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:03:I1::WU1098: Branch: master 00:00:03:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:03:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:03:I1::WU1098: Platform: win32 10 00:00:03:I1::WU1098: Bits: 64 00:00:03:I1::WU1098: Mode: Release 00:00:03:I1::WU1098:************************************ System ************************************ 00:00:03:I1::WU1098: CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz 00:00:03:I1::WU1098: CPU ID: GenuineIntel Family 6 Model 158 Stepping 12 00:00:03:I1::WU1098: CPUs: 8 00:00:03:I1::WU1098: Memory: 15.91GiB 00:00:03:I1::WU1098:Free Memory: 10.92GiB 00:00:03:I1::WU1098: Threads: WINDOWS_THREADS 00:00:03:I1::WU1098: OS Version: 6.2 00:00:03:I1::WU1098:Has Battery: true 00:00:03:I1::WU1098: On Battery: false 00:00:03:I1::WU1098: UTC Offset: -5 00:00:03:I1::WU1098: PID: 14296 00:00:03:I1::WU1098: CWD: C:\ProgramData\FAHClient\work 00:00:03:I1::WU1098:******************************************************************************** 00:00:03:I1::WU1098:Project: 18480 (Run 162, Clone 0, Gen 5) 00:00:03:I1::WU1098:Unit: 0x00000000000000000000000000000000 00:00:03:I1::WU1098:Digital signatures verified 00:00:03:I1::WU1098:Calling: mdrun -c frame5.gro -s frame5.tpr -x frame5.xtc -cpi state.cpt -cpt 5 -nt 6 -ntmpi 1 00:00:03:I1::WU1098:ERROR:Guru Meditation #abdb0dab117a0421.6809e67b93b2de47 (2866074.2866401) 'sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904/01/dhdl.xvg' 00:00:06:W ::WU1098:Core returned UNKNOWN_ENUM (3221226505) 00:00:06:W ::WU1098:Core exited with Windows unhandled exception code 0xc0000409. See https://bit.ly/2CXgWkZ for more information. 00:00:06:I1::WU1098:Retry #1 in 2 secs 00:00:08:I3:Removing old file 'work/sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904/logfile_01-20230316-235620.txt' 00:00:08:I3::WU1098:Running FahCore: C:\ProgramData\FAHClient\cores/fahcore-a8-win-64bit-avx2_256-0.0.12/FahCore_a8.exe -dir sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904 -suffix 01 -version 8.1.16 -lifeline 17448 -np 6 00:00:08:I3::WU1098:Started FahCore on PID 17324 00:00:09:I1::WU1098:*********************** Log Started 2023-03-17T00:00:08Z *********************** 00:00:09:I1::WU1098:************************** Gromacs Folding@home Core *************************** 00:00:09:I1::WU1098: Core: Gromacs 00:00:09:I1::WU1098: Type: 0xa8 00:00:09:I1::WU1098: Version: 0.0.12 00:00:09:I1::WU1098: Author: Joseph Coffland 00:00:09:I1::WU1098: Copyright: 2020 foldingathome.org 00:00:09:I1::WU1098: Homepage: https://foldingathome.org/ 00:00:09:I1::WU1098: Date: Jan 16 2021 00:00:09:I1::WU1098: Time: 12:29:40 00:00:09:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:09:I1::WU1098: Branch: master 00:00:09:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:09:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:09:I1::WU1098: Platform: win32 10 00:00:09:I1::WU1098: Bits: 64 00:00:09:I1::WU1098: Mode: Release 00:00:09:I1::WU1098: SIMD: avx2_256 00:00:09:I1::WU1098: OpenMP: ON 00:00:09:I1::WU1098: CUDA: OFF 00:00:09:I1::WU1098: Args: -dir sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904 -suffix 01 00:00:09:I1::WU1098: -version 8.1.16 -lifeline 17448 -np 6 00:00:09:I1::WU1098:************************************ libFAH ************************************ 00:00:09:I1::WU1098: Date: Jan 16 2021 00:00:09:I1::WU1098: Time: 11:24:13 00:00:09:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:09:I1::WU1098: Branch: master 00:00:09:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:09:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:09:I1::WU1098: Platform: win32 10 00:00:09:I1::WU1098: Bits: 64 00:00:09:I1::WU1098: Mode: Release 00:00:09:I1::WU1098:************************************ CBang ************************************* 00:00:09:I1::WU1098: Date: Jan 16 2021 00:00:09:I1::WU1098: Time: 11:23:53 00:00:09:I1::WU1098: Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225 00:00:09:I1::WU1098: Branch: master 00:00:09:I1::WU1098: Compiler: Visual C++ 2019 16.7 00:00:09:I1::WU1098: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT 00:00:09:I1::WU1098: Platform: win32 10 00:00:09:I1::WU1098: Bits: 64 00:00:09:I1::WU1098: Mode: Release 00:00:09:I1::WU1098:************************************ System ************************************ 00:00:09:I1::WU1098: CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz 00:00:09:I1::WU1098: CPU ID: GenuineIntel Family 6 Model 158 Stepping 12 00:00:09:I1::WU1098: CPUs: 8 00:00:09:I1::WU1098: Memory: 15.91GiB 00:00:09:I1::WU1098:Free Memory: 10.91GiB 00:00:09:I1::WU1098: Threads: WINDOWS_THREADS 00:00:09:I1::WU1098: OS Version: 6.2 00:00:09:I1::WU1098:Has Battery: true 00:00:09:I1::WU1098: On Battery: false 00:00:09:I1::WU1098: UTC Offset: -5 00:00:09:I1::WU1098: PID: 17324 00:00:09:I1::WU1098: CWD: C:\ProgramData\FAHClient\work 00:00:09:I1::WU1098:******************************************************************************** 00:00:09:I1::WU1098:Project: 18480 (Run 162, Clone 0, Gen 5) 00:00:09:I1::WU1098:Unit: 0x00000000000000000000000000000000 00:00:09:I1::WU1098:Digital signatures verified 00:00:09:I1::WU1098:Calling: mdrun -c frame5.gro -s frame5.tpr -x frame5.xtc -cpi state.cpt -cpt 5 -nt 6 -ntmpi 1 00:00:09:I1::WU1098:ERROR:Guru Meditation #abdb0dab117a0421.6809e67b93b2de47 (2866074.2866401) 'sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904/01/dhdl.xvg' 00:00:17:W ::WU1098:Core returned UNKNOWN_ENUM (3221226505) 00:00:17:W ::WU1098:Core exited with Windows unhandled exception code 0xc0000409. See https://bit.ly/2CXgWkZ for more information. 00:00:17:I1::WU1098:Retry #1 in 2 secs 00:00:19:I3:Removing old file 'work/sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904/logfile_01-20230316-235624.txt' 00:00:19:I3::WU1098:Running FahCore: C:\ProgramData\FAHClient\cores/fahcore-a8-win-64bit-avx2_256-0.0.12/FahCore_a8.exe -dir sy4uCSxHHptZFb8IxNPp1gIAvfDHOfUpqxj5l-yN904 -suffix 01 -version 8.1.16 -lifeline 17448 -np 6 00:00:19:I3::WU1098:Started FahCore on PID 8952 ```
jcoffland commented 7 months ago

I believe this is fixed.