FoldingAtHome / fah-client-bastet

Folding@home client, code named Bastet
GNU General Public License v3.0
73 stars 12 forks source link

Client not logging percent completion messages - macOS #234

Closed joe-h-fah closed 5 months ago

joe-h-fah commented 6 months ago

Running the alpha v8.3.9 on two different Mac systems with OS versions 13.6.3 and 10.13.6. Client log records the client and WU startup messages, but after the starting at n% message dos not show any progress messages. Just records when the WU completes, the upload, and download of a new WU and that WU's startup.

Perhaps related to this, the rollover to a new log coincided with a new WU starting, not midnight UTC.

Log from one system, similar log on the other system:

*********************** Log Started 2024-05-08T00:05:44Z ***********************
00:05:44:I1:*********************** Folding@home Client ***********************
00:05:44:I1:    Version: 8.3.9
00:05:44:I1:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:05:44:I1:        Org: foldingathome.org
00:05:44:I1:  Copyright: 2023-2024, foldingathome.org
00:05:44:I1:   Homepage: https://foldingathome.org/
00:05:44:I1:    License: GPL-3.0-or-later
00:05:44:I1:        URL: https://alpha.foldingathome.org/
00:05:44:I1:       Date: May 5 2024
00:05:44:I1:       Time: 20:29:14
00:05:44:I1:   Revision: cbfffd1f5171bafa84f3ba5aa7b35d7de740dcf8
00:05:44:I1:     Branch: master
00:05:44:I1:   Compiler: GNU Apple LLVM 14.0.0 (clang-1400.0.29.202)
00:05:44:I1:    Options: -std=c++17 -fsigned-char -stdlib=libc++ -O3 -funroll-loops -fno-pie
00:05:44:I1:             -arch x86_64 -arch arm64 -mmacosx-version-min=10.13
00:05:44:I1:   Platform: darwin 21.6.0
00:05:44:I1:       Bits: 64
00:05:44:I1:       Mode: Release
00:05:44:I1:     Config: /Library/Application Support/FAHClient/config.xml
00:05:44:I1:****************************** CBang ******************************
00:05:44:I1:    Version: 1.7.2
00:05:44:I1:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:05:44:I1:        Org: Cauldron Development LLC
00:05:44:I1:  Copyright: Cauldron Development LLC, 2003-2024
00:05:44:I1:   Homepage: https://cauldrondevelopment.com/
00:05:44:I1:    License: LGPL-2.1-or-later
00:05:44:I1:       Date: May 6 2024
00:05:44:I1:       Time: 09:21:46
00:05:44:I1:   Revision: 1d4d6e75ca886277090a9cbb9331ab5bebfe9e1f
00:05:44:I1:     Branch: master
00:05:44:I1:   Compiler: GNU Apple LLVM 14.0.0 (clang-1400.0.29.202)
00:05:44:I1:    Options: -std=c++17 -fsigned-char -stdlib=libc++ -O3 -funroll-loops -fno-pie
00:05:44:I1:             -arch x86_64 -arch arm64 -mmacosx-version-min=10.13 -fPIC
00:05:44:I1:   Platform: darwin 21.6.0
00:05:44:I1:       Bits: 64
00:05:44:I1:       Mode: Release
00:05:44:I1:***************************** System ******************************
00:05:44:I1:        CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
00:05:44:I1:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
00:05:44:I1:       CPUs: 8
00:05:44:I1:     Memory: 48.00GiB
00:05:44:I1:Free Memory: 38.11GiB
00:05:44:I1: OS Version: 10.13.6
00:05:44:I1:Has Battery: false
00:05:44:I1: On Battery: false
00:05:44:I1:   Hostname: Joes-Hack
00:05:44:I1: UTC Offset: -4
00:05:44:I1:        PID: 8864
00:05:44:I1:        CWD: /Library/Application Support/FAHClient
00:05:44:I1:       Exec: /usr/local/bin/fah-client
00:05:44:I1:*******************************************************************
00:05:44:I2:<config>
00:05:44:I2:  <!-- Logging -->
00:05:44:I2:  <log-rotate-max v='64'/>
00:05:44:I2:
00:05:44:I2:  <!-- Resource Settings -->
00:05:44:I2:  <cpus v='4'/>
00:05:44:I2:
00:05:44:I2:  <!-- Server -->
00:05:44:I2:  <allow v='127.0.0.1 192.168.1.2-192.168.1.255'/>
00:05:44:I2:
00:05:44:I2:  <!-- User Information -->
00:05:44:I2:  <passkey v='*****'/>
00:05:44:I2:  <team v='38910'/>
00:05:44:I2:  <user v='Joe_H'/>
00:05:44:I2:</config>
00:05:45:I1:Opening Database
00:05:45:I1:F@H ID = oNfK3FWOfWiBQnCbGrHoO9vIFs9pFLsxbmxKJgz-fPo
00:05:45:I3:Loading default group
00:05:45:I3:Loading default resource group
00:05:45:I1:Listening for HTTP on 127.0.0.1:7396
00:05:45:I3:WU3539:Loading work unit 3539 with ID UOYRH8qRcbkcj7Yx8CZ6eCOBEEH9DvH1BI167ZTKeV4
00:05:45:I3:Loaded 1 wus.
00:05:45:I3:gpus = {
00:05:45:I3:  "gpu:00:02:00": {"vendor": 32902, "device": 22802, "type": "intel", "supported": false, "description": "KBL GT2 [HD Graphics 630]"},
00:05:45:I3:  "gpu:01:00:00": {"vendor": 4318, "device": 7171, "type": "nvidia", "supported": false, "description": "GP106 [GeForce GTX 1060 6GB] 4372"}
00:05:45:I3:}
00:05:45:I1:Loaded cores/fahcore-a8-osx-64bit-avx2_256-0.0.12/FahCore_a8
00:05:45:I3:WU3539:Running FahCore: "/Library/Application Support/FAHClient/cores/fahcore-a8-osx-64bit-avx2_256-0.0.12/FahCore_a8" -dir UOYRH8qRcbkcj7Yx8CZ6eCOBEEH9DvH1BI167ZTKeV4 -suffix 01 -version 8.3.9 -lifeline 8864 -np 4
00:05:45:I3:WU3539:Started FahCore on PID 8867
00:05:45:I1:WU3539:*********************** Log Started 2024-05-08T00:05:45Z ***********************
00:05:45:I1:WU3539:************************** Gromacs Folding@home Core ***************************
00:05:45:I1:WU3539:       Core: Gromacs
00:05:45:I1:WU3539:       Type: 0xa8
00:05:45:I1:WU3539:    Version: 0.0.12
00:05:45:I1:WU3539:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:05:45:I1:WU3539:  Copyright: 2020 foldingathome.org
00:05:45:I1:WU3539:   Homepage: https://foldingathome.org/
00:05:45:I1:WU3539:       Date: Jan 16 2021
00:05:45:I1:WU3539:       Time: 21:26:21
00:05:45:I1:WU3539:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
00:05:45:I1:WU3539:     Branch: master
00:05:45:I1:WU3539:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.28)
00:05:45:I1:WU3539:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
00:05:45:I1:WU3539:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
00:05:45:I1:WU3539:   Platform: darwin 20.2.0
00:05:45:I1:WU3539:       Bits: 64
00:05:45:I1:WU3539:       Mode: Release
00:05:45:I1:WU3539:       SIMD: avx2_256
00:05:45:I1:WU3539:     OpenMP: ON
00:05:45:I1:WU3539:       CUDA: OFF
00:05:45:I1:WU3539:       Args: -dir UOYRH8qRcbkcj7Yx8CZ6eCOBEEH9DvH1BI167ZTKeV4 -suffix 01
00:05:45:I1:WU3539:             -version 8.3.9 -lifeline 8864 -np 4
00:05:45:I1:WU3539:************************************ libFAH ************************************
00:05:45:I1:WU3539:       Date: Jan 16 2021
00:05:45:I1:WU3539:       Time: 21:21:45
00:05:45:I1:WU3539:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
00:05:45:I1:WU3539:     Branch: master
00:05:45:I1:WU3539:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.28)
00:05:45:I1:WU3539:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
00:05:45:I1:WU3539:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
00:05:45:I1:WU3539:   Platform: darwin 20.2.0
00:05:45:I1:WU3539:       Bits: 64
00:05:45:I1:WU3539:       Mode: Release
00:05:45:I1:WU3539:************************************ CBang *************************************
00:05:45:I1:WU3539:       Date: Jan 16 2021
00:05:45:I1:WU3539:       Time: 21:21:02
00:05:45:I1:WU3539:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
00:05:45:I1:WU3539:     Branch: master
00:05:45:I1:WU3539:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.28)
00:05:45:I1:WU3539:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
00:05:45:I1:WU3539:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
00:05:45:I1:WU3539:   Platform: darwin 20.2.0
00:05:45:I1:WU3539:       Bits: 64
00:05:45:I1:WU3539:       Mode: Release
00:05:45:I1:WU3539:************************************ System ************************************
00:05:45:I1:WU3539:        CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
00:05:45:I1:WU3539:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
00:05:45:I1:WU3539:       CPUs: 8
00:05:45:I1:WU3539:     Memory: 48.00GiB
00:05:45:I1:WU3539:Free Memory: 38.09GiB
00:05:45:I1:WU3539:    Threads: POSIX_THREADS
00:05:45:I1:WU3539: OS Version: 10.13
00:05:45:I1:WU3539:Has Battery: false
00:05:45:I1:WU3539: On Battery: false
00:05:45:I1:WU3539: UTC Offset: -4
00:05:45:I1:WU3539:        PID: 8867
00:05:45:I1:WU3539:        CWD: /Library/Application Support/FAHClient/work
00:05:45:I1:WU3539:********************************************************************************
00:05:45:I1:WU3539:Project: 18457 (Run 1, Clone 12, Gen 990)
00:05:45:I1:WU3539:Unit: 0x00000000000000000000000000000000
00:05:45:I1:WU3539:Digital signatures verified
00:05:45:I1:WU3539:Calling: mdrun -c frame990.gro -s frame990.tpr -x frame990.xtc -cpi state.cpt -cpt 5 -nt 4 -ntmpi 1
00:05:45:I1:WU3539:Steps: first=495000000 total=500000000
00:05:45:I1:WU3539:Completed 3475022 out of 5000000 steps (69%)
00:51:49:I1:WU3539:Core returned FINISHED_UNIT (100)
00:51:49:I1:Default:Added new work unit: cpus:4 gpus:
00:51:49:I1:WU3539:Uploading WU results
00:51:49:I1:WU3540:Requesting WU assignment for user Joe_H team 38910
00:51:49:I1:OUT4:> POST https://vav23.fah.temple.edu/api/results HTTP/1.1
00:51:52:I1:OUT4:< HTTP/1.1 200 HTTP_OK
00:51:52:I1:WU3539:Credited
00:51:54:I1:OUT5:> POST https://assign1.foldingathome.org/api/assign HTTP/1.1
00:51:55:I1:OUT5:< HTTP/1.1 200 HTTP_OK
00:51:55:I1:WU3540:Received WU assignment hmJwyziVRVv3STZBAtqlTjbEN3ZeicYiSR8vVx7qlzE
00:51:55:I1:WU3540:Downloading WU
00:51:55:I1:OUT6:> POST https://vav23.fah.temple.edu/api/assign HTTP/1.1
00:51:56:I1:OUT6:< HTTP/1.1 200 HTTP_OK
00:51:56:I1:WU3540:Received WU
00:51:56:I3:WU3540:Running FahCore: "/Library/Application Support/FAHClient/cores/fahcore-a8-osx-64bit-avx2_256-0.0.12/FahCore_a8" -dir hmJwyziVRVv3STZBAtqlTjbEN3ZeicYiSR8vVx7qlzE -suffix 01 -version 8.3.9 -lifeline 8864 -np 4
00:51:56:I3:WU3540:Started FahCore on PID 9240
00:51:56:I1:WU3540:*********************** Log Started 2024-05-08T00:51:56Z ***********************
00:51:56:I1:WU3540:************************** Gromacs Folding@home Core ***************************
00:51:56:I1:WU3540:       Core: Gromacs
00:51:56:I1:WU3540:       Type: 0xa8
00:51:56:I1:WU3540:    Version: 0.0.12
00:51:56:I1:WU3540:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:51:56:I1:WU3540:  Copyright: 2020 foldingathome.org
00:51:56:I1:WU3540:   Homepage: https://foldingathome.org/
00:51:56:I1:WU3540:       Date: Jan 16 2021
00:51:56:I1:WU3540:       Time: 21:26:21
00:51:56:I1:WU3540:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
00:51:56:I1:WU3540:     Branch: master
00:51:56:I1:WU3540:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.28)
00:51:56:I1:WU3540:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
00:51:56:I1:WU3540:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
00:51:56:I1:WU3540:   Platform: darwin 20.2.0
00:51:56:I1:WU3540:       Bits: 64
00:51:56:I1:WU3540:       Mode: Release
00:51:56:I1:WU3540:       SIMD: avx2_256
00:51:56:I1:WU3540:     OpenMP: ON
00:51:56:I1:WU3540:       CUDA: OFF
00:51:56:I1:WU3540:       Args: -dir hmJwyziVRVv3STZBAtqlTjbEN3ZeicYiSR8vVx7qlzE -suffix 01
00:51:56:I1:WU3540:             -version 8.3.9 -lifeline 8864 -np 4
00:51:56:I1:WU3540:************************************ libFAH ************************************
00:51:56:I1:WU3540:       Date: Jan 16 2021
00:51:56:I1:WU3540:       Time: 21:21:45
00:51:56:I1:WU3540:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
00:51:56:I1:WU3540:     Branch: master
00:51:56:I1:WU3540:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.28)
00:51:56:I1:WU3540:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
00:51:56:I1:WU3540:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
00:51:56:I1:WU3540:   Platform: darwin 20.2.0
00:51:56:I1:WU3540:       Bits: 64
00:51:56:I1:WU3540:       Mode: Release
00:51:56:I1:WU3540:************************************ CBang *************************************
00:51:56:I1:WU3540:       Date: Jan 16 2021
00:51:56:I1:WU3540:       Time: 21:21:02
00:51:56:I1:WU3540:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
00:51:56:I1:WU3540:     Branch: master
00:51:56:I1:WU3540:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.28)
00:51:56:I1:WU3540:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
00:51:56:I1:WU3540:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
00:51:56:I1:WU3540:   Platform: darwin 20.2.0
00:51:56:I1:WU3540:       Bits: 64
00:51:56:I1:WU3540:       Mode: Release
00:51:56:I1:WU3540:************************************ System ************************************
00:51:56:I1:WU3540:        CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
00:51:56:I1:WU3540:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
00:51:56:I1:WU3540:       CPUs: 8
00:51:56:I1:WU3540:     Memory: 48.00GiB
00:51:56:I1:WU3540:Free Memory: 38.79GiB
00:51:56:I1:WU3540:    Threads: POSIX_THREADS
00:51:56:I1:WU3540: OS Version: 10.13
00:51:56:I1:WU3540:Has Battery: false
00:51:56:I1:WU3540: On Battery: false
00:51:56:I1:WU3540: UTC Offset: -4
00:51:56:I1:WU3540:        PID: 9240
00:51:56:I1:WU3540:        CWD: /Library/Application Support/FAHClient/work
00:51:56:I1:WU3540:********************************************************************************
00:51:56:I1:WU3540:Project: 18457 (Run 1, Clone 12, Gen 991)
00:51:56:I1:WU3540:Unit: 0x00000000000000000000000000000000
00:51:56:I1:WU3540:Reading tar file core.xml
00:51:56:I1:WU3540:Reading tar file frame991.tpr
00:51:56:I1:WU3540:Digital signatures verified
00:51:56:I1:WU3540:Calling: mdrun -c frame991.gro -s frame991.tpr -x frame991.xtc -cpt 5 -nt 4 -ntmpi 1
00:51:56:I1:WU3540:Steps: first=495500000 total=500500000
00:51:56:I1:WU3540:Completed 1 out of 5000000 steps (0%)
03:22:23:I1:WU3540:Core returned FINISHED_UNIT (100)
03:22:23:I1:Default:Added new work unit: cpus:4 gpus:
kbernhagen commented 6 months ago

Seems to still be broken with v8.3.10 on macOS.

jcoffland commented 6 months ago

When a WU starts running the client spawns off a thread to follow the core log. What I think is happening here is that the first read of the log is successful but then subsequent reads are failing. But the strange thing is, none of the related code appears to have changed. So what changed on macOS that caused this to start happening from v8.3.9 on?

kbernhagen commented 6 months ago

If I had to guess, it would be that the reader object gets released.

kbernhagen commented 6 months ago

it might have been broken on 8.3.8 as well. I think my dev build of 8.3.7+ was working.

kbernhagen commented 6 months ago

macOS 14.4.1 Apple silicon v8.3.5 works, v8.3.8 does not The intermediate released builds .6,.7 do not launch

I believe a bleeding edge "8.3.7+" from mid-Feb worked, but I can't find it now.

v8.3.10 with <log-domain-levels v='TailFileToLog:7'/> does not show anything else

jcoffland commented 6 months ago

One possibility is that the file stream used to read the core log is getting closed. Or something has changed in how macOS handles reading past the end of a file. TailFileToLog repeatedly reads past the end, waiting for more data.

jcoffland commented 6 months ago

This is really strange. I added some more error checking and a debug line to TailFileToLog. It sees no errors, i.e. the file stream appears to be fine. But, it also does not see any new data. It just churns along happily with out reading any more data after the first read. This code hadn't changed for years before this bug started to occur. It doesn't make much sense. I wonder if it was caused by a compiler or library upgrade.

kbernhagen commented 6 months ago

I believe the build machine has been using the same compiler for many months. I don’t remember the last system update. I think it’s still behind the latest macOS 12.

kbernhagen commented 6 months ago

Is this bug not occurring on any other platform?

jcoffland commented 6 months ago

Definitely not on Linux. And @Hou5e's logs show it working correctly on Windows. Very strange. Maybe there's a bug somewhere else that's corrupting memory and this is the side-effect.

kbernhagen commented 6 months ago

The progress jumps to an integer percent without it being logged. Does Unit get actual progress from reading the core output? If so, that would indicate the Unit is processing the core output; it just doesn't get echoed to the log.

jcoffland commented 6 months ago

The client reads the progress info from wuinfo_01.dat which the core writes to periodically. It's ugly but that's how it's worked since before I started with F@H.

What I'd really like to do is have the core read from stdin and write to stdout and the client open those streams as pipes when it spawns the core. Then we could do away with reading back the log file, reading wuinfo_01.dat or wudata_01.dat, etc. But this would require new cores that supported both the new interface and the old one.

jcoffland commented 6 months ago

I finally figured this out. Previously we were using a custom implementation of std::fstream in C!. I recently simplified things and went back to using the built-in std::fstream for most cases. Unfortunately, there is some inconsistent behavior between libstdc++ implementations. Specifically, you must call std::istream::seekg() on some implementations in order to fully reset the stream after you've hit EOF. So even though the TailFileToLog code hadn't changed it now needs the std::istream::seekg() call to work correctly.

The fix is in the upcoming v8.3.11.

kbernhagen commented 6 months ago

Obscure bug. Makes me think of the corollary "The nice thing about standards is there are so many implementations to choose from."

I can confirm this is working on macOS 14.4.1 intel and Apple silicon.

joe-h-fah commented 6 months ago

Just updated my machines to v8.3.11, now it just shows "No matching log lines" in the log window of the web control. This on an Intel macOS 10.13.6 and a 13.6.7 Apple silicon machine.

kbernhagen commented 6 months ago

Is the client crash-looping when you try to fold? Beware rapid bogus failed WU uploads killing your bonus, among other things. You can see crash reports in the Console.app, of course.

joe-h-fah commented 6 months ago

The client is folding just fine and WUs have uploaded from both systems. This version appears to have fixed the entries getting into the log file itself, percent progress messages now show up there. But maybe the fix is now blocking the web client from reading the log. Can start a new report in the web control issues if that is needed.

kbernhagen commented 6 months ago

Is that the release or debug build?

joe-h-fah commented 6 months ago

The release build.