Closed SE20225 closed 9 months ago
Before going any further, I need you to closely review the above editing I have performed on your original problem report to make sure it is correct.
Your original report was badly formatted, so I had to edit it so that it would display properly. I may have accidentally deleted some important information?
So please review the above problem description and make any needed changes to ensure the information presented is accurate. Thanks.
(You can edit any comment you make to GitHub by simply clicking the ...
on the right side of the comment's title, and selecting "Edit" from the list of available actions.)
I added some text to clarify the pieces from the MVS dump rather than trying to edit these parts themselves.
This has never reoccurred after I changed the config to:
0F40 3350 dasd/work00.140
0:0140 3350 localhost::0F40
1:0140 3350 localhost::0F40
where 0F40
is a device which is not used by the MVS system involved.
Hercules version: 4.6.0.10941-SDL-g65c97fd6 Running on: T480S (Windows-6.2.9200 Intel(R) x64) LP=8, Cores=4, CPUs=1
My testing with optional paths has now progressed to 3350 with device 0140, volser WORK00, being the first 3350 in the config. It is actually a completely empty volume, with just a VTOC in the rest of cyl 0.
A lot of reading works OK on any of the paths, but the two available paths definitely work slightly differently.
The MVS involved is a TK4- regenned to include
FEATURES=ACR
andCRH
. (i.e. support for multiprocessing and Channel Recovery Hardware, Connect and Disconnect Channels)Snippet from Hercules configuration file:
Running IEHLIST LISTVTOC DUMP with only path via only CP 0 gives CC=0, everything works perfectly:
With both paths online, this is the typical result:
IEHLIST uses BSAM to read the VTOC (as a sequential file) and the CCW chain looks like this:
After having read all DSCBs in a track, 47 decimal, it searches to record 2F, the last one, and checks on READ COUNT with a sense of 0004, File Protect.
The extent is the remainder of cyl 0, but the file mask (not actually traced, but taken from the DEB) is 58 so no multitrack operation is allowed, so the sense seems very much in order. And the code switches to next track and continues to read all-zero DSCBs. This when things go right.
I have taken a dump when the IEH108I message is to be issued and it shows:
The IOB starts at A4B38. The error post code of 41 at A4B3C is caused by the incorrect length return in the CSW which is at A4B40. The CCW chain starts at A4B60. After positioning with SEARCH ID to the last record on the track, the result should have been a norecordfound but instead it appears that garbage is read instead of a reasonable count field at A4B98 and the keyanddata field at A6974. Well noreordfound or rather a switch to the next track as long as we remain inside the VTOX extent which is the res of cyl 0.
At a superficial glance, this looks very much like the count field (garbage?) data.
The enclosed file MVS_LOG_2paths contains the Hercules CCW trace from a different test with the same end result, it only processed more tracks before running into the problem.
This time it was searching to 000000192F. So plenty of tracks were processed OK and then read a count field containing 789CCDCED5110273 and the following data field starting at A6974:
The 'garbage' data read looks similar, 'C' code for x86 or compressed tracks?
Trying IEHLIST with only path through CP 1 gives an ABEND913-20 issued during OPEN (TYPE=J) for the VTOC as a sequential dataset. Solid failure.
Searches for R1 in the first VTOC track, the FMT4 DSCB. Location is from the UCB. R0 seems reasonable then something which should not be in the VTOC and ends with no record found after two revolutions.
It seems as if Hercules always start at HA so why take another round when the 255/255/65535 end-of-track marker is found? Slight performance improvement possible!
Hercules CCW trace in file "MVS_LOG_CPU1Only.txt"
My guess is that somehow the two interfaces to the disk interfere or are not properly serialized causing the pointer to next record to become corrupt. I have therefore tried a more symmetrical setup like:
where 0101 is a device which is not touched by MVS.
Everything I have tried so far works fine, so if the above story describes a defect, I have a fine bypass to allow my MP experimentation to continue. Maybe I should try the remaining DASD types as well.
It seems rather trivial to recreate my problem. Maybe there is more room in the track after the last DSCB for the other types?
I also happened to see issue #575, where there are also strange count fields.
Please advise
Anders Edlund, andersedlund@telia.com