SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
237 stars 89 forks source link

10 seconds intermittent delay with multiple paths to a DASD device #624

Closed SE20225 closed 6 months ago

SE20225 commented 6 months ago

I have an interest in the 370 style multiprocessors running MVS (now TK5 with an IOGEN to reduce the number of UCBs, add up to 4 paths to DASD (the MVS/370 maximum, two from each processor) and bring in the CRH-function, which is of special interest although it is not involved in the 'problem' discussed here. (CRH is to provide function to handle I/O which is only physically connected to the processor that was 'lost' due to a HW problem.)

On the initial attempts it was suggested that I achieve multiple paths by specifying like:

029F 3390 dasd/tk5res.390 1:029F 3390 localhost::029F

but this caused some intermittent malfunctions. Instead I now specify like:

0F9F 3390 dasd/tk5res.390 0:029F 3390 localhost::0F9F 1:029F 3390 localhost::0F9F

where the F9F device is not used from the MVS system. The malfunctions (false EQCs and other) are gone, but instead one can observe a few 10 sec delays.

The first unexpected delay occurs already while the config is created is 10 seconds per additional path established. I do not know how or what to trace at this time and there is no I/O from the virtual machine at this time.

In the book I found the command msglevel +dasd +channel, but they generate an error message.

Occasionally there is also a 10 seconds delay when the system is up and running and all cases I have looked into occur when MVS happens to swtich from CPU 0 to CPU1 (or the other way around) for the next I/O to the traced device (which is the syssres volume). PURGE (cached tracks in the DASD sharing scheme) processing is always active at the time. But when scanning the trace there are also occasions of switching to the other CPU with no delays.

I also noted that 10 seconds appear in the config file and changed TIMERINT to 5000 or cckd GCINT to 5 but these 10 seconds delay remained unchanged.

Could this be a tuning problem? With cache etcetera?

The delay during machine establishment is 16:23:26 to 16:23:36 and the traced delay when the TSO user is logged on is from 16:28:24 to 16:28:33. The delay is clearly noticeable for the TSO user and sometimes console commands are delayed. At 16:25:48 is an example of a quick switch to the other CPU.

The documentation provided in the below DELAY.zip attachment is:

The delay during configuration processing should be easy to recreate since no host code is (yet) involved. It might have a different cause though!

Anders Edlund andersedlund@telia.com

Fish-Git commented 6 months ago

Instead I now specify like:

0F9F 3390 dasd/tk5res.390 0:029F 3390 localhost::0F9F 1:029F 3390 localhost::0F9F

where the F9F device is not used from the MVS system. The malfunctions (false EQCs and other) are gone, but instead one can observe a few 10 sec delays.

The first unexpected delay occurs already while the config is created is 10 seconds per additional path established. I do not know how or what to trace at this time and there is no I/O from the virtual machine at this time.

The delay during configuration processing should be easy to recreate since no host code is (yet) involved.

I will try to reproduce the startup delay that occurs when Hercules is first started but before any IPL, since that should be pretty easy to reproduce since all of my test dasds can be dummy/empty volumes since I won't be IPLing anything.

Fish-Git commented 6 months ago

GOOD NEWS!

I was able to reproduce the 10-second delay problem and I have a fix for it!

I will be committing it within the next day or so. Possibly later tonight.

Fish-Git commented 6 months ago

Fixed by commit 79a14889f3602d28e376aa2897b224d523c58ca9.

Closing.

SE20225 commented 6 months ago

I found the changes you had made on the git system and manually made the same changes to my source and then rebuilt Hercules.

Both reported problems are now gone!

I guess the lowered times are not really TIMEOUTs, but rather the time before another attempt. As a TSO user, one no longer notices any unusual delays! Thanks!

Fish-Git commented 6 months ago

Both reported problems are now gone!

That's good to hear!

I guess the lowered times are not really TIMEOUTs, but rather the time before another attempt.

Correct.

As a TSO user, one no longer notices any unusual delays! Thanks!

You are very welcome, Anders.  :)