FoldingAtHome / fah-client-bastet

Folding@home client, code named Bastet
GNU General Public License v3.0
74 stars 13 forks source link

Log not rotated on Windows, with no new log #233

Closed Hou5e closed 6 months ago

Hou5e commented 6 months ago

At 0 UTC (5pm PST), the log rotate is initiated, but FAH v8.3.9 on Windows doesn't move to the new log file. @kbernhagen reported that this also happened for FAH v8.3.8 on Windows.

The log shown below is still at: C:\ProgramData\FAHClient\log.txt And, was not moved and renamed with the date to the 'logs' folder.

End of the log looks like this 3 hours later while still folding:

23:56:15:I1:WU1626:Checkpoint completed at step 1050000
23:57:18:I1:WU1626:Completed 1062500 out of 1250000 steps (85%)
23:58:19:I1:WU1626:Completed 1075000 out of 1250000 steps (86%)
23:58:21:I1:WU1626:Checkpoint completed at step 1075000
23:59:23:I1:WU1626:Completed 1087500 out of 1250000 steps (87%)
*********************** Log Rotated 2024-05-08T00:00:24Z ***********************

The web UI looks like this 3 hours later while folding: image

image

Hou5e commented 6 months ago

It had the same results for with FAH v8.3.10, on Windows for this issue. The log didn't rotate and the last log entry was 5 hours ago just before 0 UTC while it's still folding.

23:55:38:I1:WU1633:Completed 75000 out of 1250000 steps (6%)
23:55:44:I1:WU1633:Checkpoint completed at step 75000
23:56:38:I1:WU1633:Completed 87500 out of 1250000 steps (7%)
23:57:32:I1:WU1633:Completed 100000 out of 1250000 steps (8%)
23:57:38:I1:WU1633:Checkpoint completed at step 100000
23:58:33:I1:WU1633:Completed 112500 out of 1250000 steps (9%)
23:59:27:I1:WU1633:Completed 125000 out of 1250000 steps (10%)
23:59:33:I1:WU1633:Checkpoint completed at step 125000
*********************** Log Rotated 2024-05-09T00:00:27Z ***********************

image

jcoffland commented 6 months ago

Is this from the actual log file or from Web Control's view of it?

Hou5e commented 6 months ago

Both, that is what was trying to show. That you can't see anything past 0 UTC in the GUI, and the entire log file for both days looks like this: FAH v8.3.10: log-20240509-190042.txt

FAH v8.3.9: log-20240508-191453.txt

I'll try running the Debug version today to see if there are any clues with that version.

jcoffland commented 6 months ago

Those log files look normal. log.txt is moved to logs/log-YYYYMMDD-HHMMSS.txt. The last line is the "Log Rotated" line. Then it should continue with a new log.txt. Are you saying there's no new log.txt?

Hou5e commented 6 months ago

The filenames only look normal because I closed FAH, and restarted it, where the log.txt is rotated properly on FAH startup (this restart happened each day to update to the new version of FAH). If I had left FAH running then the file would still be log.txt in the main data folder (not logs/log-YYYYMMDD-HHMMSS.txt), and there would be no new log data after 0 UTC (in the UI or the log file).

Hou5e commented 6 months ago

In the initial startup of the Windows FAH log header, is the $( __ $) supposed to be there in the Options: for the Client and CBang sections?

Example from Client section: Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /W1 /MTd


Running with a 10 minute log rotate with the Debug version, I didn't see anything new or interesting. The log just stops at the log rotate like the other examples. I stopped and restarted FAH to get it to rotate and make the filename with the date: log-20240510-085635.txt

jcoffland commented 6 months ago

Got. That's what I needed to know. So to summarize, on Windows:

Hou5e commented 6 months ago

Yep, That's a good summary of the issue.

There are some minor odd side-effects seen with the short log rotation. I restarted FAH to get the next log (this one is not interesting): log-20240510-093235.txt

I restarted FAH again and there were ~3 log rotations before 'Finishing' the WU. This was what was at the end. When the WU finished the log was wiped out with the finishing info (not rotated): log-20240510-120305.txt And, the UI looked like this on the PC that 'Finished' the WU, but FAH was idle and should not say 'Disconnected' for Win10-PC: image

Note: one of the other 'Disconnected' PCs was running: X99-12, and looked like this locally on that PC: image

Hou5e commented 6 months ago

For FAH v8.3.11 Build183 (Debug) on Windows, there was a useful error in the GUI log messages, and it didn't get saved to the log.txt file. I missed capturing it from the GUI at 0 UTC when I first saw it, but got it with a 10 minute log rotate:

01:39:22:I1:WU1649:Completed 1112500 out of 1250000 steps (89%)
01:41:05:E :Exception: Failed to rename 'log.txt' to 'logs\log-20240512-014105.txt': The process cannot access the file because it is being used by another process.
01:41:05:E :
01:41:05:E : At: src\cbang\os\SystemUtilities.cpp:729:rename()
01:41:05:E :Caught at: src\cbang\event\Event.cpp:137:call()
01:41:06:E :WU1649:Stream error: 3 

log-20240512-014106_GUI_Text.txt

For comparison, the same section written to the log file looks like this:

01:39:22:I1:WU1649:Completed 1112500 out of 1250000 steps (89%)
*********************** Log Rotated 2024-05-12T01:41:05Z ***********************

If FAH isn't folding (like it was Finishing a WU) then it can rotate the log: GUI messages from the screen: log-20240512-022400_GUI_Text.txt

Log files of the same events: Part 1: log-20240512-022239.txt Part 2: log-20240512-023044.txt

jcoffland commented 6 months ago

I finally figured this out. The problem was due to a change in the file handling. If you use the standard C++ std::fstream on Windows there is no easy way to prevent the file handle from being inherited by subprocesses. So when the client spawns the core subprocess it gets a copy of the log file handle. Later, if the core is still running, if you close the log file and try to rotate it, it fails with an access error because the core process still has an open file handle to the log.

The solution was to implement a custom std::streambuf which sets the correct flags when opening the file which prevent the file handle from being inherited by subprocesses. This is actually how it used to work in cbang but I tried to simplify the code which lead to several difficult bugs.

Hou5e commented 6 months ago

For FAH v8.3.11 Alpha (Debug) on Windows, you get 2 error messages on FAH startup where it crashes quickly with: image image

Last line of the log: 20:49:52:E :Thread 1 deallocated while still active log-20240513-205115.txt

Hou5e commented 6 months ago

Also, both Windows installers {Release & Debug} have the debug browse information file included https://github.com/FoldingAtHome/fah-client-bastet/blob/3917839146eac0e30079900cef812224ce35a2be/install/win/fah-client.nsi#L209 that makes the installer ~3x larger. This should probably be put back to the way it was once these issues get resolved. Or, include the .pdb file for the Debug version, and not include it for the Release version.

The uninstaller section of the script should make sure it deletes that file: Add: Delete "$INSTDIR\FAHClient.pdb" below this line: https://github.com/FoldingAtHome/fah-client-bastet/blob/3917839146eac0e30079900cef812224ce35a2be/install/win/fah-client.nsi#L346

Hou5e commented 6 months ago

For FAH v8.3.13 Alpha (Debug) on Windows, it looks like the log rotate is working well and happened at 0UTC. I turned the log rotate on for 10 minute intervals after that, where it has rotated about 50 times while I was testing it.

Hou5e commented 6 months ago

[It works too well now... 😄 ] I went back to the default once a day log rotate with the quick WUs turned on with FAH v8.3.13 Alpha (Debug) on Windows there was enough log activity to get (2) log rotations at 0UTC (not sure it it was coincidental or if 2 threads are both trying to do the log rotation??). I truncated the 22MB log to the first few and last few entries: log-20240516-235959.txt Second Log at 0UTC: log-20240517-000000.txt

jcoffland commented 6 months ago

This is unrelated. The rotation triggered one second early due to a rounding error. This caused it to trigger again at the correct time. I found and fixed this issue for the next release.