SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
237 stars 89 forks source link

Log format during shutdown (Issue 480) #634

Closed JamesWekel closed 6 months ago

JamesWekel commented 6 months ago

Here is a 2nd solution to fix two log format issues during shutdown Issue 480:

  1. duplicate log lines on non-Windows systems.

Issue related to logger processing and timing of shutdown logger_unredirect(). To fix the timing issue, a system shutbegin flag notifies logger_thread() that shutdown is beginning and to execute logger_unredirect() before setting system shutdown flag. Timing is now controlled by logger_thread().

  1. interleaved log lines with two time stamps or missing a timestamp.

Update logger_timestamped_logfile_write to have a lock if timestamps are required.

Tested using MVS TK5 startup/shutdown on Armbain Jammy (Ubuntu) on Raspberry Pi4 and Intel Nuc, and on Windows (Intel Nuc). Missing log lines _(e.g. 'haothread' ended) from previous 'bad' solution are included in the log. Not sure how to explicitly test problem #2. My test runs do not show any interleaved log lines but that doesn't imply the problem is resolved.

Note: this solution does not make any attempt to resolve missing shutdown messages on the console.

I would appreciate further testing on other systems and any comments on the solution.

Thanks, Jim

JamesWekel commented 6 months ago

Fish,

Not a problem. I welcome the review and comments. Change completed.

Jim

Fish-Git commented 6 months ago

Merged!

Thanks again, James!

wrljet commented 6 months ago

We do still need to fully test it out on other systems, non-Linux, BSDs, macOS, etc. I'll do some of them randomly.

Fish-Git commented 6 months ago

We do still need to fully test it out on other systems, non-Linux, BSDs, macOS, etc. I'll do some of them randomly.

Damn! I keep forgetting about that. I'll give it a try on my Kubuntu 21.10 system too.

Fish-Git commented 6 months ago

FWIW, it seems to work just fine on both my Kubuntu 21.10 and my old CentOS 6.10 system too, both virtual machines running under VMware WorkStation (not that that makes any difference of course).

Should we close GitHub Issue #480 now?

wrljet commented 6 months ago

I'd like to try it first on BSD and macOS before closing.

wrljet commented 6 months ago

Works on macOS (Sonoma 14.4 Beta).

I'll try BSD next.

Fish-Git commented 5 months ago

Well? Is BSD okay?

I'd like to close issue #480!

wrljet commented 5 months ago

Fish,

So just tried things out with the latest commit, on FreeBSD 14, and get endless errors in 'make check':

I then rebuilt on Debian, and getting the same there.

The very recent change for that processor stopped message is confusing/breaking the tests?

For example:

Variable $cmpxchg8           set to "1"
Variable $cmpxchg16          set to "1"
Variable $hatomics           set to "C11"
>>>>> line   153: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "1403 printer Load UCS Buffer":  1 OK compares.  One failure.
SKIPPING: Testcase 3211 printer
REASON:   No Hercules Rexx support.
>>>>> line  1255: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
>>>>> line  1258: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "CBUC (Concurrent Block Update Consistency)":  0 OK compares.  2 failures.
>>>>> line  1317: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "CCW-ILS (CCW Incorrect Length Suppression)":  1 OK compares.  One failure.
>>>>> line  1359: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
>>>>> line  1360: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "cmpxchg16 as used by CDSG, STPQ and LPQ instructions":  0 OK compares.  2 failures.
>>>>> line  1401: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCL-et-al (Test CLCL, MVCIN and TRT instructions)":  0 OK compares.  One failure.
>>>>> line  1424: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCL Unaligned Buffers Test":  0 OK compares.  One failure.
>>>>> line  1448: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-01-unaligned-buffers Test":  0 OK compares.  One failure.
>>>>> line  1472: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-02-unaligned-buffers Test":  0 OK compares.  One failure.
>>>>> line  1496: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-03-basic (Test CLCLE instructions)":  0 OK compares.  One failure.
>>>>> line  1520: Received unexpected wait state:  000A0000 00000000; processor stopped
Test "CLCLE-04-performance (Test CLCLE instructions)":  0 OK compares.  One failure.
>>>>> line  1546: Received unexpected wait state:  0002000000000000 0000000000000000; processor stopped
Test "CMPSC (Compression Call)":  0 OK compares.  One failure.
>>>>> line  1572: Received unexpected wait state:  0002000180000000 0000000000000000; processor stopped
Test "CU12-01-xpage (Test CU12 instruction)":  0 OK compares.  One failure.
>>>>> line  1599: Received unexpected wait state:  0002000180000000 0000000000000000; processor stopped

...

Test "multix1":  1 OK compares.  One failure.
>>>>> line 25399: Received unexpected wait state:  0002000180000000 0000000000000000; processor stopped
Test "zeos":  5 OK compares.  One failure.
Did 240 tests.  233 failed; 7 OK.
gmake: *** [Makefile:3602: check] Error 233

Bill

wrljet commented 5 months ago

As far as this issue with the log output during shutdown is concerned, though, the changes to seem to work correctly on FreeBSD.

So I'd say we're good to go.

Bill

JamesWekel commented 5 months ago

Bill: Thank you for all the testing, especially pointing out the problems with the first fix version.

Fish: thank you for the extra tests and reviewing the pull request.

Much appreciate all your support.

Jim

Fish-Git commented 5 months ago

The very recent change for that processor stopped message is confusing/breaking the tests?

Yep.  It sure as hell is.

(F**K!)

I jumped right on it and fixed it as soon as I got up this morning and saw your message, but also noticed another unrelated problem too that I introduced recently: the E7Prefix "Various CKD dasd I/O tests" test keeps failing because its shadow file isn't being deleted when the test finishes.  :(

If you manually delete all shadow files in the tests directory before running your 'runtest', it works fine. But if you immediately try it again, it fails because the old shadow file is still there when the test starts even though its .tst script clearly said to discard it when the test was over!  (sf-a80 nomerge. The nomerge should be deleting the shadow file but for reasons unknown it's not.)

I already have the fix for the "processor stopped" SNAFU ready to go and it does indeed correct the cause for all of the other 230+ failures. I'll go ahead and commit it ASAP.

BUT...   be aware that the other bug -- the 'sf' command's failure to delete the file when nomerge is specified -- will still remain, causing that one test to continue to fail whenever the old (previous) shadow file is still around when the test runs. So until I can find and fix it, just be sure no shadow files exist in the tests directory before running your tests and all tests should pass.

Fish-Git commented 5 months ago

As far as this issue with the log output during shutdown is concerned, though, the changes to seem to work correctly on FreeBSD.

So I'd say we're good to go.

Good. I'll go ahead and close issue #480 then. Thanks.

Fish-Git commented 5 months ago

BUT...   be aware that the other bug -- the 'sf' command's failure to delete the file when nomerge is specified -- will still remain, causing that one test to continue to fail whenever the old (previous) shadow file is still around when the test runs. So until I can find and fix it, just be sure no shadow files exist in the tests directory before running your tests and all tests should pass.

Fixed!

wrljet commented 5 months ago

The very recent change for that processor stopped message is confusing/breaking the tests?

Yep. It sure as hell is.

(FK!)**

I jumped right on it and fixed it as soon as I got up this morning and saw your message, but also noticed another unrelated problem too that I introduced recently: the E7Prefix "Various CKD dasd I/O tests" test keeps failing because its shadow file isn't being deleted when the test finishes. :(

If you manually delete all shadow files in the tests directory before running your 'runtest', it works fine. But if you immediately try it again, it fails because the old shadow file is still there when the test starts even though its .tst script clearly said to discard it when the test was over! (sf-a80 nomerge. The nomerge should be deleting the shadow file but for reasons unknown it's not.)

I already have the fix for the "processor stopped" SNAFU ready to go and it does indeed correct the cause for all of the other 230+ failures. I'll go ahead and commit it ASAP.

BUT... be aware that the other bug -- the 'sf' command's failure to delete the file when nomerge is specified -- will still remain, causing that one test to continue to fail whenever the old (previous) shadow file is still around when the test runs. So until I can find and fix it, just be sure no shadow files exist in the tests directory before running your tests and all tests should pass.

Both issues tested and verifiied are fixed as of commit: https://github.com/SDL-Hercules-390/hyperion/commit/140a0cf89cff61e91b7bd9b1de5850b6c1ea9cd1

Tests work, and are "serially reusable".

wrljet commented 5 months ago

Thank you, James, for your work and contribution!