Closed davekreiss closed 1 year ago
I am only slightly familiar with the cckd
tracing command only insomuch as it provides additional details regarding a given I/O above and beyond what is otherwise provided by a simple CCW trace. It seems to trace the internal functioning of Hercules CCKD logic, tracing each major CCKD function that is called during the processing of a given CCW I/O. I am otherwise not very familiar with it at all. The cckd
command (and specifically the tracing side of it) was written by the same Hercules developer who wrote Hercules's support for CCKD (Compressed CKD) itself. Prior to that we only had plain CKD. He designed and wrote ALL of our compressed CKD (CCKD) logic years ago, and unfortunately he's dead, having passed away from rectal cancer. :(
After spending a few minutes looking at the code, it appears that the cckd
command's trace=
option controls the size of the consolidated, system wide internal CCKD trace table that accumulates formatted CCKD trace entries for any/all CCKD I/O for any/all devices. That is to say, when a non-zero trace=
value is specified for the cckd
command, CCKD tracing entries are saved in the table for any/all CCKD devices, which can then of course be dumped/displayed at any time via the 'k'
command.
The cckd debug=
option however, controls whether a given internal CCKD trace entry is either, or also, immediately printed at the moment the trace entry is created, depending on whether CCW tracing is enabled for that specific device.
So the two things -- cckd internal trace table, and cckd debug option -- were apparently designed for two completely different things. The cckd trace=nnnnnn
option was designed to keep a record of all internal CCKD activity for any/all devices independently from normal CCW tracing. (Apparently so the Hercules developer could see how CCKD processing for one device interacted/occurred while CCKD activity was simultaneously taking place for other devices).
The cckd debug=1
option however, was designed to be used in conjunction with normal CCW tracing for a specific device. That is to say, the same exact CCKD tracing message is formed (formatted/created) for each of the two options, but for debug=1
the CCKD trace message is printed immediately at the moment it is formed, but only if CCW tracing is enabled for that particular device (i.e. only if the t+dev
command was given for that device).
Does that make sense?
So my advice to you is, since, by your own words, you know which dataset is being corrupted (and thus which specific device's I/O is involved with the corruption):
Interestingly, the corruption is to a non-load library data set - AHELP, a simple 80 byte fixed block PDS containing HELP info. The corruption occurs in the directory portion of the AHELP data set, and is very consistent through all runs. It is reproducible and always occurs between the WTORs.
I would do a normal t+dev
CCW trace for the device in question (where "dev" of course is the device number of the device holding the AHELP dataset's directory) along with cckd debug=1,trace=0
. (Then use cckd debug=0
and t-dev
to disable tracing once you've captured the problem.)
This should provide a very detailed trace for just that one specific device where the corruption is obviously occurring, and should (HOPEFULLY!) tell us what's going on (i.e. should hopefully tell us whether the problem is with Hercules or with your guest).
Does that make sense? Does that help?
Once you create the trace, post your Hercules log here, and we'll take a look at it.
Thanks.
Ok, I set up the CCW trace, and believe I found something interesting. A little background of this JOB: The first JOB allocates a series of data sets across 9 mod 3 3390 volumes. The data set in question always ends up in the same cylinder on device 3004 since the volume always starts out empty. This data set always gets corrupted.
In the trace I see this, which seems wrong!
18:00:51 HHC00435I 0:3004 CKD file C:/R/BLDDLB.192: cyl 98 head 0 record 255 kl 255 dl 65535 of 0
Record 255, Key length 255, Data length 65535, all look invalid!
Next entry in trace:
18:00:51 HHC00431I 0:3004 CKD file C:/R/BLDDLB.192: seeking to cyl 31 head 0
Cylinder 31 head 0 is exactly the corrupted first track in the AHELP data set.
Then we see this a little farther:
18:00:51 HHC00434I 0:3004 CKD file C:/R/BLDDLB.192: read count orientation data
18:00:51 HHC00435I 0:3004 CKD file C:/R/BLDDLB.192: cyl 31 head 0 record 3 kl 8 dl 256 of 0
18:00:51 HHC00440I 0:3004 CKD file C:/R/BLDDLB.192: updating cyl 31 head 0 record 3 kl 8 dl 256
(i.e. updating the AHELP data set...)
Examining the define extent CCW, shows cyl 1F (31) is out of that range of cyl 62 head 0 to cyl 6B head E:
18:00:51 HHC01334I 0:3004 CHAN: ORB: IntP:00F535A8 Key:5 LPM:80 Flags:0C280 ....FP....H. L....... CCW:7F29D120
18:00:51 HHC00396I 0:3004 start i/o file[0] bufcur 1471 cache[91]
18:00:51 HHC01315I 0:3004 CHAN: ccw 63400010 7F29D138=>10CC0000 00000000 00620000 006B000E .............,..
18:00:51 HHC01315I 0:3004 CHAN: ccw 63400010 7F29D138=>10CC0000 00000000 00620000 006B000E .............,..
18:00:51 HHC01312I 0:3004 CHAN: stat 0C00, count 0000
18:00:51 HHC01315I 0:3004 CHAN: ccw 08000000 7F29D348
18:00:51 HHC01315I 0:3004 CHAN: ccw 47400010 7B43E128=>01810031 00620000 00620000 07220108 .a..............
Enclosing the complete from IPL to shutdown Hercules log, and a smaller subset where these trace records are:
I can attach a GTF trace printed out if necessary, as well as a track print if needed, or any other info needed.
A simple volume mapping program (my own utility) of that volume may help in understanding the layout:
The AHELP
data set is at cchh 001F 0000 to 0020 000E, and is the corrupted data set.
The AOSB3
data set is the data set the define extent maps, and is the load library that should be updated at cchh 0062 0000 to 006B 000E.
BLDDLB (3390-1) --Start-- ---End--- ----Extent---- Data Set Name CCCC HHHH CCCC HHHH Size # 0000 0000 0000 0000 1 T Volume Label 0000 0001 0000 000E 14 T 1 VTOC 0001 0000 0014 000E 20 C 1 DSK.MVS.NEW.ACMDLIB 0015 0000 001E 000E 10 C 1 DSK.MVS.NEW.AGENLIB 001F 0000 0020 000E 2 C 1 DSK.MVS.NEW.AHELP 0021 0000 0024 000E 4 C 1 DSK.MVS.NEW.AIMAGE 0025 0000 0038 000E 20 C 1 DSK.MVS.NEW.ALPALIB 0039 0000 0047 000E 15 C 1 DSK.MVS.NEW.AMACLIB 0048 0000 0051 000E 10 C 1 DSK.MVS.NEW.AMODGEN 0052 0000 0059 000E 8 C 1 DSK.MVS.NEW.AOSA0 005A 0000 005B 000E 2 C 1 DSK.MVS.NEW.AOSA1 005C 0000 005F 000E 4 C 1 DSK.MVS.NEW.AOSBN 0060 0000 0061 000E 2 C 1 DSK.MVS.NEW.AOSB0 0062 0000 006B 000E 10 C 1 DSK.MVS.NEW.AOSB3 006C 0000 006D 000E 2 C 1 DSK.MVS.NEW.AOSCA 006E 0000 0077 000E 10 C 1 DSK.MVS.NEW.AOSCD 0078 0000 0079 000E 2 C 1 DSK.MVS.NEW.AOSCE
So it definitely looks like Hercules took a wrong turn somewhere. Where and why, I'll let you figure out! :)
Thanks.
In the trace I see this, which seems wrong!
18:00:51 HHC00435I 0:3004 CKD file C:/R/BLDDLB.192: cyl 98 head 0 record 255 kl 255 dl 65535 of 0Record 255, Key length 255, Data length 65535, all look invalid!
I would have to agree! :(
Enclosing the complete from IPL to shutdown Hercules log, and a smaller subset where these trace records are:
Thanks. I'll take a look at it the very first moment I get.
I'm kind of in the middle of working on something else right now, and I'd like to finish it up before tackling this problem. I've just made a major breakthrough, and have, I believe, finally solved it. All I need to do now is write up my findings for peer review, which is going to take me a short(?) while. (There's a lot to document!)
Once I've posted my findings though, I'll jump on your (this) issue right away, I promise! Cool?
p.s. I don't suppose there's a fast and easy way for some idiot like me (who knows nothing about MVS / z/OS) to be able to recreate/reproduce this problem, is there? If there is, I'd very much appreciate you passing it along! It would sure help to expedite resolving this issue! Thanks!
I don't think there is an easy way to reproduce this problem since it takes multiple jobs which runs a total of over 40 minutes.
I believe that the channel program is related to STOW SVC which is recording the directory entry for a load module given the data recorded in the wrong extent. There are a lot of libraries involved in the process and it of course isn't simple.
I would be willing to run any tests you wish with test versions of Hercules if that is needed. The data sets used for these job streams are the publicly available TK3/TK4- MVS 3.8 source, target and distribution libraries.
I could construct a complete TK3 system with all it takes to run it with simple commands but that would take a few days at a minimum.
I don't think there is an easy way to reproduce this problem since it takes multiple jobs which runs a total of over 40 minutes.
"40 minutes"?! Sheesh! That sucks. :(
How fast if your Hercules host (i.e. host hardware)? Maybe it would run faster on my system? My system has 3.0GHz Intel Xeon processors. Is your host's processor(s) faster or slower than that? (If it's a newer system I'm guessing it's probably faster. My system is quite a few years old.)
I could construct a complete TK3 system with all it takes to run it with simple commands but that would take a few days at a minimum.
I'm not in any rush. Are you?
FYI: Other distracting issue resolved. Have begun looking into your issue now. But it's late in the day now, so I'll starting digging into it in earnest first thing tomorrow.
David,
In lieu of constructing for me a complete TK3 system, would it be possible for you to upload your device 3004 "C:/R/BLDDLB.192" CCKD file (along with any shadow files you might have for it, if you're using shadow files that is) to my softdevlabs.com ftp site, at the point immediately before the problem occurs?
Since we've already captured the I/O sequence (series of I/O commands and associated channel programs) that triggers the problem, I'm wondering if maybe I might be able to quickly(?) reproduce the problem here on my system by writing a simple(?) stand alone program that simply issues the exact same I/O sequence(s) (chain of CCWs) that we can obviously see from your trace triggers the problem.
Of course, having a complete TK3 system "with all it takes to run it with simple commands" would be the ideal, but I'm just trying to think of possible alternatives. Technically, the only thing I would technically need would be your device 3004 "C:/R/BLDDLB.192" CCKD file (along with any associated shadow files you might have for it) in the state it was in immediately before the problem occurred. Then if I issue the same sequence of channel programs against it, I should technically then be able to reproduce (trigger) the problem. Hopefully.
Anyway, it's just a thought. I would prefer the complete TK3 system of course (so that I can reproduce the problem at will anytime I want), but as I said, I'm just trying to think of alternative approaches.
Thanks.
So, the TK3 corruption doesn't show itself in the same way as 2.1 corruption. I haven't quite narrowed down the set links that the corruption occurs in, and it is on a different volume and different data set as well. Still working on that but it is a process of narrowing down the specific links and takes a little time.
So, I have included the BLDDLB volume at 3004:
So, the job is a series of link edits that produce the channel program in question. At the point of my intercept just before invoking IEWL (linkage editor) a WTOR is issued. At that point I cancelled the job, shutdown the OS and then zipped up the volume. There are no shadows for this volume as all the data there is totally reproducible.
History of how that volume is created: As part of the JCL stream is a step which removes all data sets from that volume and reallocates and populates them via SMP.
I've included a printout of the GTF trace for which I've already uploaded the CCW trace. I found the first occurrence of the channel program in question searching for 8D400108 7B441654. The GTF CCW trace contains dump of all the data along with the channel program:
I have reproduced the same strange "record 255 kl 255 dl 65535" on TK3, though it occurs on a different volume and data set.
I will setup the TK3 system so a single job is ready in the queue with a WTOR to the console just before the link edit occurs. At that point, a t+193
can be done. Then any response to the WTOR will run that specific link edit.
Once that link edit has completed, another WTOR will come up. At that point, enter t-193
to stop the trace, and you have a reproduction.
I will include an example so you can see how it works along with a .zip file of all the volumes etc. to run it. It will take a little time to get that together, but I am including the Hercules log with the reproduction I just recreated.
No problem on how quick you can get this together since I have had this problem probably since at least Hercules 4.41, and maybe as far back as v4.00. I always assumed it was an OS problem.
I have created a TK3 system with instructions, but the resulting two zip files are over 4G (split in two zip files). The limit here is 25M, so we will need to find a way to drop these files to you.
I have created a TK3 system with instructions, but the resulting two zip files are over 4G (split in two zip files). The limit here is 25M, so we will need to find a way to drop these files to you.
You can either FTP upload it to my SoftDevLabs FTP Server's "incoming" folder:
and then let me know when you've done so, so that I can then download it.
Note: the "incoming" folder is write-only. As a result, you will not be able to "see" what you've uploaded, because doing a 'dir' (or 'ls') to list the directory's contents is necessarily a read operation, and as I just said, the folder (directory) is marked write-only. It does not have read permission.
OR... you can of course upload it to some other place of your own choosing if you wish, and just give me the URL so that I can then download it from there.
Your choice.
Never FTPed to a write only server and couldn't figure out how. I dropped on dropbox. Hope these links work.
TK3-1.ZIP (file://DAVES5510/Users/Dave/Dropbox/TK3-1.ZIP) TK3-2.ZIP (file://DAVES5510/Users/Dave/Dropbox/TK3-2.ZIP)
Dave
Well that was futile my Dropbox only takes 2G and I never use it and didn’t realize that. I need a lesson on FTP in order to send the files. Send me a windows procedure and let’s see if I can follow them. Dave
(unrelated to FTP)
FYI: Please see this comment. Thanks.
Never FTPed to a write only server and couldn't figure out how.
I need a lesson on FTP in order to send the files.
FTPing to a write-only folder (directory) is no different than FTPing anywhere else. The procedure is exactly the same. The only difference is, as I explained, that you won't be able to list the directory, that's all.
Send me a windows procedure and let’s see if I can follow them.
Command line, or GUI? Because I personally don't do FTPing via the command line. I prefer to use a GUI application on Windows. They're much more user friendly and easier to use IMO. The one I use is called FileZilla. But then I use FileZilla because I do a lot of FTPing too.
If you don't (and I suspect that's likely true), then doing it via the command line is almost as easy. (There's nothing to install; all Windows systems come with ftp.)
Below is a sample command line FTP session with some blank lines inserted for readability.
Notice how you login as user "anonymous". Also notice how you can list any directory's contents via ls -al
except for the "incoming" folder's (since it's read-only). But uploading files to that directory (via the put
command) still works just fine.
PLEASE ALSO NOTICE how you must use the binary
command before issuing your put
command!
C:\Users\Fish> ftp www.softdevlabs.com Connected to softdevlabs.com. 220 (vsFTPd 2.2.2) User (softdevlabs.com:(none)): anonymous 230 Login successful. ftp> ls 200 PORT command successful. Consider using PASV. 150 Here comes the directory listing. AeroAdmin Fish Ubuntu 18.04.3 SCSI dosvs incoming links manuals misc networking politics schuknecht shedlock 226 Directory send OK. ftp: 121 bytes received in 0.00Seconds 121000.00Kbytes/sec. ftp> ls -al 200 PORT command successful. Consider using PASV. 150 Here comes the directory listing. drwxrwxr-x 14 500 500 4096 Mar 30 23:04 . drwxrwxr-x 14 500 500 4096 Mar 30 23:04 .. drwxrwxr-x 2 500 500 4096 Jun 02 2018 AeroAdmin drwxrwxr-x 3 500 500 4096 Mar 30 23:04 Fish Ubuntu 18.04.3 drwxrwxr-x 4 500 500 4096 Aug 13 2014 SCSI drwxrwxr-x 3 500 500 4096 Jan 24 2016 dosvs drwx-wx-wx 2 500 500 4096 Jun 15 01:40 incoming drwxrwxr-x 2 500 500 4096 Aug 13 2014 links drwxrwxr-x 5 500 500 4096 Jan 20 2020 manuals drwxrwxr-x 2 500 500 4096 Jun 06 2016 misc drwxrwxr-x 4 500 500 4096 Sep 13 2014 networking drwxrwxr-x 3 500 500 4096 Jun 29 2022 politics drwxrwxr-x 2 500 500 4096 Mar 19 2015 schuknecht drwxrwxr-x 3 500 500 4096 May 26 2018 shedlock 226 Directory send OK. ftp: 912 bytes received in 0.00Seconds 912000.00Kbytes/sec. ftp> cd manuals 250 Directory successfully changed. ftp> ls -al 200 PORT command successful. Consider using PASV. 150 Here comes the directory listing. drwxrwxr-x 5 500 500 4096 Jan 20 2020 . drwxrwxr-x 14 500 500 4096 Mar 30 23:04 .. drwxrwxr-x 2 500 500 4096 Sep 17 2016 DOS drwxrwxr-x 2 500 500 4096 Sep 17 2016 DOS-VS drwxrwxr-x 2 500 500 4096 Sep 17 2016 DOS-VSE -rw-rw-r-- 1 500 500 16340053 Sep 17 2016 GC24-5140-00 DOS-VSE Macro Reference.pdf -rw-rw-r-- 1 500 500 11112344 Sep 17 2016 GC28-6515-06 System-360 FORTRAN IV Language.pdf -rw-rw-r-- 1 500 500 26797143 Sep 17 2016 GC28-6515-11 System-360 and System-370 FORTRAN IV Language.pdf -rw-rw-r-- 1 500 500 79 Jan 24 2016 _IBM Manuals at BitSavers.Org.url 226 Directory send OK. ftp: 723 bytes received in 0.00Seconds 723000.00Kbytes/sec. ftp> cd .. 250 Directory successfully changed. ftp> cd incoming 250 Directory successfully changed. ftp> ls -al 200 PORT command successful. Consider using PASV. 150 Here comes the directory listing. 226 Transfer done (but failed to open directory). ftp> binary <----------------- VERY IMPORTANT COMMAND! --------------<<<< 200 Switching to Binary mode. ftp> put c:\your-path-to-your-file\whatever.xxx 200 PORT command successful. Consider using PASV. 150 Ok to send data. 226 Transfer complete. ftp: 2432 bytes sent in 0.10Seconds 24.32Kbytes/sec. ftp> ls -al 200 PORT command successful. Consider using PASV. 150 Here comes the directory listing. 226 Transfer done (but failed to open directory). ftp> bye 221 Goodbye. C:\Users\Fish>
Hope that helps!
Well tried ftp and not successful. So I turned off both webroot firewall and also Microsoft's firewall with no success as well.
PS C:\Users\Dave\downloads> ftp www.softdevlabs.com Connected to softdevlabs.com. 220 (vsFTPd 2.2.2) 200 Always in UTF8 mode. User (softdevlabs.com:(none)): anonymous 230 Login successful. ftp> cd incoming 250 Directory successfully changed. ftp> binary 200 Switching to Binary mode. ftp> put TK3-1.ZIP 200 PORT command successful. Consider using PASV. 425 Failed to establish connection. ftp> ftp> put TK3-1.ZIP 200 PORT command successful. Consider using PASV. 553 Could not create file. ftp> put TK3-2.ZIP 200 PORT command successful. Consider using PASV. 425 Failed to establish connection. ftp> ls -al 200 PORT command successful. Consider using PASV. 425 Failed to establish connection. ftp> ftp> put TK3-1.ZIP 200 PORT command successful. Consider using PASV. 553 Could not create file. ftp> put TK3-1-try2.ZIP 200 PORT command successful. Consider using PASV. 425 Failed to establish connection. ftp> bye 221 Goodbye.
But I was able, as it turns out, to get one file to dropbox and one to idrive. Here are links. I hope you can access them.
https://1drv.ms/u/s!Ajp_FMPdPVCk0TSs4LE9nu6YxRTw https://www.dropbox.com/s/ab0gjwobl5ju4u6/TK3-2.ZIP
Dave
Are you sure your .zip files exist in your local "C:\Users\Dave\downloads" directory?
Or do they maybe exist somewhere else?
Try specifying its full path, e.g.
put "C:\my\tk3\zip\dir\TK3-1.ZIP"
I hate Window 10/11! :(
But I was able, as it turns out, to get one file to dropbox and one to idrive. Here are links. I hope you can access them.
https://1drv.ms/u/s!Ajp_FMPdPVCk0TSs4LE9nu6YxRTw https://www.dropbox.com/s/ab0gjwobl5ju4u6/TK3-2.ZIP
I was able to download the "TK3-2.ZIP" from OneDrive, but when I click on the link for DropBox, I get a "This item was deleted" web page.
I re-fetched the link and it is different than first:
In the TK3 directory you will find this TK3 IPL and CCW trace instructions.txt. I made a copy of the dasd directory and copied the disks there and used that to restore between IPLs so I had a clean set of disks for each test. Easier than running zip to unzip from two zip files.
Another piece of info - I found a 3.07 version of Hercules and the same error appears to be happening.
I re-fetched the link and it is different than first:
Got it! Thanks!
Things are starting to get a little crazy right now (talking to a lot of people via email, responding to other GitHub issues, especially #572, which I still have a minor issue/bug with that I need to get fixed, etc), but I assure you, Dave, this issue of yours is still at the top of my list, and I promise I will get to it just as soon as I possibly can! Your patience is appreciated.
DAMMIT! I'm sorry Dave, but as is too frequently the case, something else has once again popped up to temporarily take my focus away from your issue. :(
GitHub Issue #572 has been re-opened due to a reported problem preventing z/VM from being able to IPL, so working on and resolving it has temporarily jumped ahead of your issue in my list of priorities. But I promise you I have not forgotten about you and you're still right at the top, next in line.
David (@davekreiss),
I began (finally!) working on your issue today. I apologize for having taken so long.
I am unable to reproduce the problem as described, using the TK3 system you sent me.
I suspect the reason may be due to the fact that the TK3 dasd volumes you sent me, are already corrupted!
I need you to send me uncorrupted (i.e. pristine/clean) copies of the TK3 BLDDLB
and BLDTGT
volumes, so that I can try to catch the corruption in action.
First, I marked all of the dasds you sent me as read-only, and then created a Hercules configuration file that specified use of shadow files for each of them:
Doing it this way ensures that I can run each of my tests over and over again as needed, starting from the exact same starting point (i.e. state) each time.
After each test, if I want to run it again or try a different technique (test) to try and catch the bug, I simply delete the old set of shadow files from the previous test run before I start Hercules for the new test run, thereby leaving behind just the original unaltered as-delivered dasd volumes and voila! -- the new/next test begins with all of my dasd volumes in their original unaltered as-delivered state.
Afterwards, I can then examine the shadow files to see if there were any updates to the track where the corruption supposedly occurs (cylinder 31 head 0, yes?).
Before powering on Hercules and IPLing (which runs your built-in automated test automatically and supposedly recreates the problem (i.e. the corruption), yes?), I first run some Hercules tools (utilities) called cckddiag
and cckdmap
to view the current state of the affected dasds.
(Note: it's still unclear to me exactly which dasd volume it is that is supposed to "become corrupted" as a result of your test, but I'm guessing it is the BLDTGT
volume, yes? Please confirm this for me! Thanks.)
Here are the cckddiag
reports for both Cylinder 31 Head 0 and Cylinder 31 Head 1 for both BLDDLB
and BLDTGT
as they look BEFORE running your test (i.e. before IPLing the system):
CCKDDIAG -t -a 31 0 Q:/TK3/Dasd/BLDDLB.cckd32 17:30:10.005 CCKDDIAG BLDDLB.cckd32 started; process-id = 6992 (0x00001B50) 17:30:10.036 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:30:10.036 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:30:10.036 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:30:10.036 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:30:10.036 17:30:10.036 HHC02609I CC 31 HH 0 = reltrk 465; L1 index = 1, L2 index = 209 17:30:10.036 HHC02610I L1 index 1 = L2TAB offset 6612 (0x00000000000019D4) 17:30:10.036 17:30:10.036 HHC02611I L2 index 209 = L2TAB entry: 8 bytes: 17:30:10.036 +0000 FE1DB000 E504E504 ....V.V. 17:30:10.036 17:30:10.036 HHC02613I TRKHDR offset 11542014 (0x0000000000B01DFE); length 1253 (0x000004E5) bytes 17:30:10.036 HHC02612I TRKHDR track 465: 17:30:10.036 +0000 01001F00 00 ..... 17:30:10.036 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[00/0] kl[0] dl[8] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[01/1] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[02/2] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[03/3] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[04/4] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[05/5] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[06/6] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[07/7] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[08/8] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[09/9] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0A/10] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0B/11] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0C/12] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0D/13] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0E/14] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0F/15] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[10/16] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[11/17] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[12/18] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[13/19] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[14/20] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[15/21] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[16/22] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[17/23] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[18/24] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[19/25] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1A/26] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1B/27] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1C/28] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1D/29] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1E/30] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1F/31] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[20/32] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[21/33] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[22/34] kl[8] dl[256] 17:30:10.036 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[23/35] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[24/36] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[25/37] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[26/38] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[27/39] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[28/40] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[29/41] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2A/42] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2B/43] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2C/44] kl[8] dl[256] 17:30:10.052 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2D/45] kl[8] dl[256] 17:30:10.052 HHC02601I End of track 17:30:10.052 CCKDDIAG BLDDLB.cckd32 ended; rc=0 CCKDDIAG -t -a 31 1 Q:/TK3/Dasd/BLDDLB.cckd32 17:30:18.933 CCKDDIAG BLDDLB.cckd32 started; process-id = 6880 (0x00001AE0) 17:30:18.964 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:30:18.964 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:30:18.964 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:30:18.964 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:30:18.964 17:30:18.964 HHC02609I CC 31 HH 1 = reltrk 466; L1 index = 1, L2 index = 210 17:30:18.964 HHC02610I L1 index 1 = L2TAB offset 6612 (0x00000000000019D4) 17:30:18.964 17:30:18.964 HHC02611I L2 index 210 = L2TAB entry: 8 bytes: 17:30:18.964 +0000 05D32900 3F223F22 .L...... 17:30:18.964 17:30:18.964 HHC02613I TRKHDR offset 2740997 (0x000000000029D305); length 8767 (0x0000223F) bytes 17:30:18.964 HHC02612I TRKHDR track 466: 17:30:18.964 +0000 01001F00 01 ..... 17:30:18.964 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[00/0] kl[0] dl[8] 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[01/1] kl[0] dl[0] 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[02/2] kl[0] dl[1360] 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[03/3] kl[0] dl[0] 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[04/4] kl[0] dl[5840] 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[05/5] kl[0] dl[0] 17:30:18.964 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[06/6] kl[0] dl[27920] 17:30:18.964 HHC02601I End of track 17:30:18.964 CCKDDIAG BLDDLB.cckd32 ended; rc=0 CCKDDIAG -t -a 31 0 Q:/TK3/Dasd/BLDTGT.cckd32 17:30:37.074 CCKDDIAG BLDTGT.cckd32 started; process-id = 4320 (0x000010E0) 17:30:37.105 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:30:37.105 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:30:37.105 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:30:37.105 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:30:37.105 17:30:37.105 HHC02609I CC 31 HH 0 = reltrk 465; L1 index = 1, L2 index = 209 17:30:37.105 HHC02610I L1 index 1 = L2TAB offset 7341 (0x0000000000001CAD) 17:30:37.105 17:30:37.105 HHC02611I L2 index 209 = L2TAB entry: 8 bytes: 17:30:37.105 +0000 BF6BC400 3D083D08 .,D..... 17:30:37.105 17:30:37.121 HHC02613I TRKHDR offset 12872639 (0x0000000000C46BBF); length 2109 (0x0000083D) bytes 17:30:37.121 HHC02612I TRKHDR track 465: 17:30:37.121 +0000 01001F00 00 ..... 17:30:37.121 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[00/0] kl[0] dl[8] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[01/1] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[02/2] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[03/3] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[04/4] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[05/5] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[06/6] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[07/7] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[08/8] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[09/9] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0A/10] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0B/11] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0C/12] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0D/13] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0E/14] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[0F/15] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[10/16] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[11/17] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[12/18] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[13/19] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[14/20] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[15/21] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[16/22] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[17/23] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[18/24] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[19/25] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1A/26] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1B/27] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1C/28] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1D/29] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1E/30] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[1F/31] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[20/32] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[21/33] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[22/34] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[23/35] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[24/36] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[25/37] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[26/38] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[27/39] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[28/40] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[29/41] kl[8] dl[256] 17:30:37.121 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2A/42] kl[8] dl[256] 17:30:37.137 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2B/43] kl[8] dl[256] 17:30:37.137 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2C/44] kl[8] dl[256] 17:30:37.137 HHC02605I Track 465 COUNT cyl[001F/31] head[0000/0] rec[2D/45] kl[8] dl[256] 17:30:37.137 HHC02601I End of track 17:30:37.137 CCKDDIAG BLDTGT.cckd32 ended; rc=0 CCKDDIAG -t -a 31 1 Q:/TK3/Dasd/BLDTGT.cckd32 17:30:47.922 CCKDDIAG BLDTGT.cckd32 started; process-id = 6032 (0x00001790) 17:30:47.937 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:30:47.937 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:30:47.937 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:30:47.937 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:30:47.937 17:30:47.953 HHC02609I CC 31 HH 1 = reltrk 466; L1 index = 1, L2 index = 210 17:30:47.953 HHC02610I L1 index 1 = L2TAB offset 7341 (0x0000000000001CAD) 17:30:47.953 17:30:47.953 HHC02611I L2 index 210 = L2TAB entry: 8 bytes: 17:30:47.953 +0000 A0E3BD00 00180018 .T]..... 17:30:47.953 17:30:47.953 HHC02613I TRKHDR offset 12444576 (0x0000000000BDE3A0); length 6144 (0x00001800) bytes 17:30:47.953 HHC02612I TRKHDR track 466: 17:30:47.953 +0000 01001F00 01 ..... 17:30:47.953 17:30:47.953 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[00/0] kl[0] dl[8] 17:30:47.953 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[01/1] kl[0] dl[0] 17:30:47.953 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[02/2] kl[0] dl[10560] 17:30:47.953 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[03/3] kl[0] dl[0] 17:30:47.953 HHC02605I Track 466 COUNT cyl[001F/31] head[0001/1] rec[04/4] kl[0] dl[27920] 17:30:47.953 HHC02601I End of track 17:30:47.953 CCKDDIAG BLDTGT.cckd32 ended; rc=0
As you can see, Cylinder 31 Head 0 for both volumes appear to already be corrupted! (whereas Cylinder 31 Head 1 on each appears to still be intact).
Each data block on track 0 are all 256 bytes in size (I'm presuming that's the "corrupted state", yes?) whereas each block on track 1 is always a multiple of 80 bytes (which I presume is how all tracks for the dataset are supposed to look, yes?).
Sooooo.... As I said in my opening comments, it appears the volumes you sent me are already corrupted to begin with.
But whatever. I started Hercules and IPLed anyway and let your test do its thing. When the *00 Pause
message appeared, I responded as instructed (0
) and a split second later the Link ran and ended (*01 Pause
).
AT THIS POINT I then immediately shutdown Hercules and examined the shadow files for each of the two volumes to see if there were any records (any writes/updates) for Cylinder 31 Head 0 and Head 1.
There were NONE.
Here is are the cckddiag
and detailed cckdmap
output for each of the two shadow files:
CCKDDIAG -t -a 31 0 Q:/TK3/Dasd/BLDDLB_Shadow_1.cckd32 17:35:53.301 CCKDDIAG BLDDLB_Shadow_1.cckd32 started; process-id = 13836 (0x0000360C) 17:35:53.317 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:35:53.317 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:35:53.317 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:35:53.333 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:35:53.333 17:35:53.333 HHC02609I CC 31 HH 0 = reltrk 465; L1 index = 1, L2 index = 209 17:35:53.333 HHC02610I L1 index 1 = L2TAB offset 4294967295 (0x00000000FFFFFFFF) 17:35:53.333 HHC02618S L2tab for track 465 not found 17:35:53.333 HHC00007I Previous message from function 'main' at cckddiag.c(883) 17:35:53.333 CCKDDIAG BLDDLB_Shadow_1.cckd32 ended; rc=7 CCKDDIAG -t -a 31 1 Q:/TK3/Dasd/BLDDLB_Shadow_1.cckd32 17:36:06.098 CCKDDIAG BLDDLB_Shadow_1.cckd32 started; process-id = 8576 (0x00002180) 17:36:06.129 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:36:06.129 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:36:06.129 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:36:06.129 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:36:06.129 17:36:06.129 HHC02609I CC 31 HH 1 = reltrk 466; L1 index = 1, L2 index = 210 17:36:06.129 HHC02610I L1 index 1 = L2TAB offset 4294967295 (0x00000000FFFFFFFF) 17:36:06.129 HHC02618S L2tab for track 466 not found 17:36:06.129 HHC00007I Previous message from function 'main' at cckddiag.c(883) 17:36:06.145 CCKDDIAG BLDDLB_Shadow_1.cckd32 ended; rc=7 CCKDDIAG -t -a 31 0 Q:/TK3/Dasd/BLDTGT_Shadow_1.cckd32 17:36:21.171 CCKDDIAG BLDTGT_Shadow_1.cckd32 started; process-id = 11592 (0x00002D48) 17:36:21.186 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:36:21.186 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:36:21.186 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:36:21.186 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:36:21.186 17:36:21.186 HHC02609I CC 31 HH 0 = reltrk 465; L1 index = 1, L2 index = 209 17:36:21.186 HHC02610I L1 index 1 = L2TAB offset 4294967295 (0x00000000FFFFFFFF) 17:36:21.186 HHC02618S L2tab for track 465 not found 17:36:21.186 HHC00007I Previous message from function 'main' at cckddiag.c(883) 17:36:21.202 CCKDDIAG BLDTGT_Shadow_1.cckd32 ended; rc=7 CCKDDIAG -t -a 31 1 Q:/TK3/Dasd/BLDTGT_Shadow_1.cckd32 17:36:32.084 CCKDDIAG BLDTGT_Shadow_1.cckd32 started; process-id = 4116 (0x00001014) 17:36:32.116 HHC02499I Hercules utility CCKDDIAG - CCKD/CFBA diagnostic program - version 4.6.0.10941-SDL-g65c97fd6 17:36:32.116 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:36:32.116 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:36:32.116 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:36:32.116 17:36:32.116 HHC02609I CC 31 HH 1 = reltrk 466; L1 index = 1, L2 index = 210 17:36:32.116 HHC02610I L1 index 1 = L2TAB offset 4294967295 (0x00000000FFFFFFFF) 17:36:32.116 HHC02618S L2tab for track 466 not found 17:36:32.116 HHC00007I Previous message from function 'main' at cckddiag.c(883) 17:36:32.116 CCKDDIAG BLDTGT_Shadow_1.cckd32 ended; rc=7 CCKDMAP -v Q:/TK3/Dasd/BLDDLB_Shadow_1.cckd32 17:45:41.998 CCKDMAP -v BLDDLB_Shadow_1.cckd32 started; process-id = 10712 (0x000029D8) 17:45:42.029 HHC02499I Hercules utility CCKDMAP - Compressed dasd file map - version 4.6.0.10941-SDL-g65c97fd6 17:45:42.029 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:45:42.029 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:45:42.029 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:45:42.029 HHC03020I 17:45:42.029 HHC03021I CCKDMAP of: "Q:/TK3/Dasd/BLDDLB_Shadow_1.cckd32" 17:45:42.029 HHC03007I File size: (4,803 bytes) 17:45:42.029 HHC03022I 17:45:42.029 HHC03022I dh_devid: CKD_S370 (32-bit CCKD shadow file) 17:45:42.029 HHC03022I dh_heads: 15 17:45:42.029 HHC03022I dh_trksize: 56832 17:45:42.029 HHC03022I dh_devtyp: 0x90 (3390) 17:45:42.029 HHC03022I dh_fileseq: 0x00 17:45:42.029 HHC03022I dh_highcyl: 0 17:45:42.029 HHC03022I dh_serial: 17:45:42.029 HHC03023I 17:45:42.029 HHC03023I cdh_vrm: 0.3.1 17:45:42.029 HHC03023I cdh_opts: 0x41 17:45:42.029 HHC03023I num_L1tab: 66 17:45:42.029 HHC03023I num_L2tab: 256 17:45:42.029 HHC03023I cdh_cyls: 1113 (16,695 tracks) 17:45:42.029 HHC03023I cdh_size: 0x00000012C3 (4,803 bytes) 17:45:42.029 HHC03023I cdh_used: 0x00000012C3 (4,803 bytes) 17:45:42.029 HHC03023I free_off: 0x0000000000 (old format) 17:45:42.029 HHC03023I free_total: 0x0000000000 (0 bytes) 17:45:42.029 HHC03023I free_largest: 0x0000000000 (0 bytes) 17:45:42.029 HHC03023I free_num: 0 17:45:42.029 HHC03023I free_imbed: 0 17:45:42.029 HHC03023I cdh_nullfmt: 1 (ha r0) 17:45:42.029 HHC03023I cmp_algo: 1 (zlib) 17:45:42.029 HHC03023I cmp_parm: -1 (default) 17:45:42.029 HHC03020I 17:45:42.029 HHC03043I Total active tracks = 2 tracks 17:45:42.045 HHC03044I Avg. L2-to-track seek = 0.001 MB 17:45:42.045 HHC03020I 17:45:42.045 HHC03040I File offset Size (hex) Size track(s) 17:45:42.045 HHC03020I 17:45:42.045 HHC03042I devhdr 0x0000000000 0x0000000200 512 17:45:42.045 HHC03042I cdevhdr 0x0000000200 0x0000000200 512 17:45:42.045 HHC03042I l1 0x0000000400 0x0000000108 264 17:45:42.045 HHC03041I *********************************************************** 17:45:42.045 HHC03042I L2LOWER 0x0000000508 0x0000000000 0 17:45:42.045 HHC03042I trk 0x0000000508 0x000000016D 365 2 17:45:42.045 HHC03042I l2 0x0000000675 0x0000000800 2048 0 - 255 17:45:42.045 HHC03041I *********************************************************** 17:45:42.045 HHC03042I L2UPPER 0x0000000D08 0x0000000000 0 17:45:42.045 HHC03042I trk 0x0000000E75 0x000000044E 1102 1 17:45:42.045 HHC03042I eof 0x00000012C3 0x00000012C3 4803 17:45:42.045 HHC03020I 17:45:42.045 CCKDMAP -v BLDDLB_Shadow_1.cckd32 ended; rc=0 CCKDMAP -v Q:/TK3/Dasd/BLDTGT_Shadow_1.cckd32 17:45:58.287 CCKDMAP -v BLDTGT_Shadow_1.cckd32 started; process-id = 7576 (0x00001D98) 17:45:58.318 HHC02499I Hercules utility CCKDMAP - Compressed dasd file map - version 4.6.0.10941-SDL-g65c97fd6 17:45:58.318 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others 17:45:58.318 HHC01417I ** The SDL 4.x Hyperion version of Hercules ** 17:45:58.318 HHC01415I Build date: Jun 8 2023 at 15:48:12 17:45:58.318 HHC03020I 17:45:58.318 HHC03021I CCKDMAP of: "Q:/TK3/Dasd/BLDTGT_Shadow_1.cckd32" 17:45:58.318 HHC03007I File size: (725,428 bytes) 17:45:58.318 HHC03022I 17:45:58.318 HHC03022I dh_devid: CKD_S370 (32-bit CCKD shadow file) 17:45:58.318 HHC03022I dh_heads: 15 17:45:58.318 HHC03022I dh_trksize: 56832 17:45:58.318 HHC03022I dh_devtyp: 0x90 (3390) 17:45:58.318 HHC03022I dh_fileseq: 0x00 17:45:58.318 HHC03022I dh_highcyl: 0 17:45:58.318 HHC03022I dh_serial: 17:45:58.318 HHC03023I 17:45:58.318 HHC03023I cdh_vrm: 0.3.1 17:45:58.318 HHC03023I cdh_opts: 0x41 17:45:58.318 HHC03023I num_L1tab: 66 17:45:58.318 HHC03023I num_L2tab: 256 17:45:58.318 HHC03023I cdh_cyls: 1113 (16,695 tracks) 17:45:58.318 HHC03023I cdh_size: 0x00000B11B4 (725,428 bytes) 17:45:58.318 HHC03023I cdh_used: 0x00000AA94A (698,698 bytes) 17:45:58.318 HHC03023I free_off: 0x0000027F95 (NEW format) 17:45:58.318 HHC03023I free_total: 0x000000686A (26,730 bytes) 17:45:58.318 HHC03023I free_largest: 0x00000059E3 (23,011 bytes) 17:45:58.318 HHC03023I free_num: 4 17:45:58.318 HHC03023I free_imbed: 0 17:45:58.318 HHC03023I cdh_nullfmt: 1 (ha r0) 17:45:58.318 HHC03023I cmp_algo: 1 (zlib) 17:45:58.318 HHC03023I cmp_parm: -1 (default) 17:45:58.318 HHC03020I 17:45:58.318 HHC03043I Total active tracks = 47 tracks 17:45:58.318 HHC03044I Avg. L2-to-track seek = 0.130 MB 17:45:58.318 HHC03020I 17:45:58.318 HHC03040I File offset Size (hex) Size track(s) 17:45:58.318 HHC03020I 17:45:58.318 HHC03042I devhdr 0x0000000000 0x0000000200 512 17:45:58.318 HHC03042I cdevhdr 0x0000000200 0x0000000200 512 17:45:58.318 HHC03042I l1 0x0000000400 0x0000000108 264 17:45:58.318 HHC03041I *********************************************************** 17:45:58.318 HHC03042I L2LOWER 0x0000000508 0x0000000000 0 17:45:58.318 HHC03042I trk 0x0000000508 0x00000026D2 9938 2755 17:45:58.333 HHC03042I l2 0x0000002BDA 0x0000000800 2048 2560 - 2815 17:45:58.333 HHC03042I trk 0x00000033DA 0x00000033A5 13221 2756 17:45:58.333 HHC03041I *********************************************************** 17:45:58.333 HHC03042I L2UPPER 0x0000003D08 0x0000000000 0 17:45:58.333 HHC03042I trk 0x000000677F 0x0000003192 12690 2757 17:45:58.333 HHC03042I trk 0x0000009911 0x0000002FAB 12203 2758 17:45:58.333 HHC03042I trk 0x000000C8BC 0x00000033BB 13243 2759 17:45:58.333 HHC03042I trk 0x000000FC77 0x0000003485 13445 2760 17:45:58.333 HHC03042I trk 0x00000130FC 0x0000003242 12866 2761 17:45:58.333 HHC03042I trk 0x000001633E 0x00000031A7 12711 2762 17:45:58.333 HHC03042I trk 0x00000194E5 0x0000003364 13156 2763 17:45:58.333 HHC03042I trk 0x000001C849 0x000000338C 13196 2764 17:45:58.333 HHC03042I trk 0x000001FBD5 0x0000002616 9750 2766 17:45:58.333 HHC03042I trk 0x00000221EB 0x00000032FF 13055 2765 17:45:58.333 HHC03042I trk 0x00000254EA 0x000000140A 5130 2767 17:45:58.333 HHC03042I trk 0x00000268F4 0x00000016A1 5793 2768 17:45:58.333 HHC03042I free 0x0000027F95 0x00000003B5 949 17:45:58.333 HHC03042I l2 0x000002834A 0x0000000800 2048 0 - 255 17:45:58.333 HHC03042I trk 0x0000028B4A 0x000000190B 6411 2535 17:45:58.333 HHC03042I l2 0x000002A455 0x0000000800 2048 2304 - 2559 17:45:58.333 HHC03042I free 0x000002AC55 0x000000071C 1820 17:45:58.333 HHC03042I trk 0x000002B371 0x00000039C4 14788 2004 17:45:58.333 HHC03042I l2 0x000002ED35 0x0000000800 2048 1792 - 2047 17:45:58.333 HHC03042I trk 0x000002F535 0x0000002DAE 11694 2005 17:45:58.333 HHC03042I trk 0x00000322E3 0x00000025FC 9724 2006 17:45:58.333 HHC03042I trk 0x00000348DF 0x0000002116 8470 2007 17:45:58.333 HHC03042I trk 0x00000369F5 0x0000001AA3 6819 2008 17:45:58.333 HHC03042I trk 0x0000038498 0x0000003148 12616 2003 17:45:58.333 HHC03042I trk 0x000003B5E0 0x0000000487 1159 1995 17:45:58.333 HHC03042I trk 0x000003BA67 0x0000001300 4864 2009 17:45:58.333 HHC03042I trk 0x000003CD67 0x0000003218 12824 1107 17:45:58.333 HHC03042I l2 0x000003FF7F 0x0000000800 2048 1024 - 1279 17:45:58.333 HHC03042I trk 0x000004077F 0x00000012BC 4796 900 17:45:58.333 HHC03042I l2 0x0000041A3B 0x0000000800 2048 768 - 1023 17:45:58.333 HHC03042I free 0x000004223B 0x00000059E3 23011 17:45:58.333 HHC03042I trk 0x0000047C1E 0x0000005C66 23654 1380 17:45:58.333 HHC03042I l2 0x000004D884 0x0000000800 2048 1280 - 1535 17:45:58.333 HHC03042I trk 0x000004E084 0x0000007DDE 32222 1381 17:45:58.333 HHC03042I trk 0x0000055E62 0x0000005E59 24153 1382 17:45:58.333 HHC03042I trk 0x000005BCBB 0x00000075B9 30137 1383 17:45:58.333 HHC03042I trk 0x0000063274 0x00000068CE 26830 1384 17:45:58.333 HHC03042I trk 0x0000069B42 0x0000004ACB 19147 1385 17:45:58.333 HHC03042I trk 0x000006E60D 0x0000004915 18709 1386 17:45:58.333 HHC03042I trk 0x0000072F22 0x00000054FC 21756 1387 17:45:58.333 HHC03042I trk 0x000007841E 0x000000471B 18203 1388 17:45:58.333 HHC03042I trk 0x000007CB39 0x00000059A0 22944 1389 17:45:58.333 HHC03042I trk 0x00000824D9 0x0000005A75 23157 1390 17:45:58.333 HHC03042I trk 0x0000087F4E 0x0000005023 20515 1391 17:45:58.333 HHC03042I trk 0x000008CF71 0x00000056BC 22204 1392 17:45:58.333 HHC03042I trk 0x000009262D 0x0000004E55 20053 1393 17:45:58.333 HHC03042I trk 0x0000097482 0x000000480F 18447 1394 17:45:58.333 HHC03042I trk 0x000009BC91 0x0000004DD6 19926 1395 17:45:58.333 HHC03042I trk 0x00000A0A67 0x0000003CE2 15586 1396 17:45:58.333 HHC03042I trk 0x00000A4749 0x0000003413 13331 1397 17:45:58.333 HHC03042I free 0x00000A7B5C 0x00000003B6 950 17:45:58.333 HHC03042I trk 0x00000A7F12 0x0000001599 5529 1200 17:45:58.333 HHC03042I trk 0x00000A94AB 0x000000102B 4139 901 17:45:58.333 HHC03042I trk 0x00000AA4D6 0x0000000364 868 1 17:45:58.333 HHC03042I trk 0x00000AA83A 0x000000697A 27002 1108 17:45:58.333 HHC03042I eof 0x00000B11B4 0x00000B11B4 725428 17:45:58.333 HHC03020I 17:45:58.333 CCKDMAP -v BLDTGT_Shadow_1.cckd32 ended; rc=0
As you can see, neither Cylinder 31 Head 0 nor Head 1 was ever written to on EITHER of the two volumes in question, which tells me the problem you described is simply NOT HAPPENING.
No "corruption" is taking place.
As I said in my opening comment at the beginning (top) of this GitHub comment, I suspect the reason it might not be happening MAY be because the corruption has already occurred at some point in the (distant?) past.
So I ask again whether it's possible for you to send me copies of the two volumes (BLDDLB
and BLDTGT
) that are not already corrupted? You mentioned earlier I believe that the very first part of your test/procedure that supposedly causes the corruption begins with first allocating all of the needed datasets on an "empty volume" (volumes?) such that the test/procedure then always starts "clean". Then, you claim, at some point, the AHELP
(?) dataset becomes corrupted:
Well, I'm not seeing that. :(
What I'm seeing is an already corrupted dataset that is never updated at all, causing it to remain corrupted (apparently forever?).
That's it. That's my testing and analysis of the situation so far.
Soooo..... Where do we go from here?
Do I need to wait for you to send me updated (un-pre-corrupted) dasd volumes to hopefully catch the corruption taking place?
Or do you think that MAYBE, at some point in the (distant?) past, you accidentally did something that caused the corruption and simply never noticed it until now? *`()`**
I await your reply.
Thanks.
*`()`** Either because you failed to use shadow files or failed to make backups of your Hercules dasds so you could restore them if anything went wrong like it obviously did at some point?
First about "corruption". There is no corruption of the PC DASD file. The corruption is Hercules updating the wrong track.
In the z/OS "corruption", the define extent says the I/O is limited to CCHH range 0062-0000 to 006B-000E ("CCW Trace Hercules log.log", line 541393):
18:00:51 HHC01315I 0:3004 CHAN: ccw 63400010 7F29D138=>10CC0000 00000000 00620000 006B000E
However, within the domain of that define extent, Hercules wrote to CCHH 001F 0000 ("CCW Trace Hercules log.log", line 541753):
18:00:51 HHC00440I 0:3004 CKD file C:/R/BLDDLB.192: updating cyl 31 head 0 record 1 kl 8 dl 256
So, the data written at CCHH 001F 0000 "corrupts" the data set at that disk address, namely the AHELP
data set, because it belongs within the data set in the define extent that is AOSB3
.
If it helps, what is written is a directory entry. A directory entry is a keyed file with kl=8 and dl=256. What is being written is a load module directory entry, but it is written to the AHELP
data set directory.
Here is a subset of the map of that volume posted above on Jun 20th:
BLDDLB (3390-1) --Start-- ---End--- ----Extent---- Data Set Name CCCC HHHH CCCC HHHH Size # 001F 0000 0020 000E 2 C 1 DSK.MVS.NEW.AHELP <-- the updating of cyl 31 (X'1F') outside the define extent domain 0062 0000 006B 000E 10 C 1 DSK.MVS.NEW.AOSB3 <-- the define extent maps to this data set
Back to TK3 - I know the wrong track is being updated but I can't locate where it is happening. There is no define extent in MVS 3.8 channel programs, and I spent a lot of time over the weekend trying to find where the wrong track is updated in the trace but got nowhere.
Concerning the TK3 DASD files there are no DASD PC file corruption. As I said I start with clean volumes, and they are allocated and updated via a batch stream of SMP which copies/updates/assembles/links to those volumes. No shadows as I can destroy/recreate/whatever these volumes at will with no data loss.
Unless I can identify where the wrong track is updated, I think the TK3 doesn't help isolate the problem.
In the z/OS "corruption", the define extent says the I/O is limited to CCHH range 0062-0000 to 006B-000E ("CCW Trace Hercules log.log", line 541393):
18:00:51 HHC01315I 0:3004 CHAN: ccw 63400010 7F29D138=>10CC0000 00000000 00620000 006B000E
However, within the domain of that define extent, Hercules wrote to CCHH 001F 0000 ("CCW Trace Hercules log.log", line 541753):
18:00:51 HHC00440I 0:3004 CKD file C:/R/BLDDLB.192: updating cyl 31 head 0 record 1 kl 8 dl 256
Yes, I originally saw that. But when I tried it on your TK3 system that you sent me, I decided to try and catch the bug from a different angle rather than wade through thousands of lines of a CCW trace. And as I've proven, the bug is simply NOT happening! ....... (Or at least not as I understood what the the bug was, which I can now see was very wrong.)
Back to TK3 - I know the wrong track is being updated but I can't locate where it is happening. There is no define extent in MVS 3.8 channel programs, and I spent a lot of time over the weekend trying to find where the wrong track is updated in the trace but got nowhere.
And THAT is what I originally failed to catch (failed to realize/understand).
Based on earlier comments:
Those seem corrupted data sets are consistent, ...
... is very consistent through all runs. It is reproducible and always occurs between the WTORs.
p.s. I don't suppose there's a fast and easy way for some idiot like me (who knows nothing about MVS / z/OS) to be able to recreate/reproduce this problem, is there? If there is, I'd very much appreciate you passing it along! It would sure help to expedite resolving this issue! Thanks!
I could construct a complete TK3 system with all it takes to run it with simple commands but that would take a few days at a minimum.
I was thus under the impression that the problem was reproducible, as described, on TK3 too. Or, perhaps more correctly, that the problem was always at the same place (same CCHH) on both (i.e. at the same Cyl 31 Head 0 on both z/OS and TK3 too).
I now understand that that belief was horribly wrong. I now understand that the corruption always occurs at the same place (cyl 31 head 0) only on z/OS 2,1! (Yes? Is that correct?)
But, while the "same corruption" also always occurs on TK3 too, it doesn't happen at the same place (at the same CCHH) on TK3! (Yes? True? Am I understanding things correctly now?)
Soooo............ Where does that leave us?
Well, apparently it leaves me with a TK3 system that supposedly can reproduce the corruption, but precisely where that corruption occurs (i.e. at what cylinder and head) we currently don't know yet. :(
QUESTION: Does this corruption always occur in the AHELP
directory? If you sent me a volume map like the one you provided before on June 20 (but it was for your z/OS 2.1 system, yes?) but for this TK3 system you sent me, wouldn't that tell us the CCHH of the AHELP
dataset and thus where the problem should be occurring? Yes? No?
As you can tell, I'm trying like hell to somehow get this TK3 system to still be useful to me for catching this bug. I'm honestly quite disappointed that we're unable to pinpoint precisely where (the precise CCHH) the corruption is occurring on TK3. That would make catching this bug so much easier!
QUESTION #2:
I happen to have z/OS 2.1 on my system. It's a stock/vanilla ADCD system that IBM gave me. (No! Really!) Is it possible to reproduce the problem on my z/OS 2.1 system via, say, the simple submission of a job stream you could send me? Is that possible?
Or in lieu of that, how difficult/complicated would it be for you to write up a set of detailed instructions for how I could reproduce the problem here on my system? Possible? Or impossible (i.e. too difficult)?
FINALLY, I would be remiss to mention to you that, since you originally reported this problem, several important fixes have been committed to our development repository for the E7 (Prefix), 63 (Define Extent), 47 (Located Record), 4B (Locate Record Extended) and 27 (Perform Subsystem Function) CCW logic in Hercules. (As it turns out, that's what was delaying me from getting started on YOUR problem.)
Are you able to build Hercules for yourself from source? We have tools available (a nice script called "Hercules Helper" written by Bill Lewis, one of our developers) that makes doing so incredibly easy. You might consider building the latest develop
branch of Hercules to see if it fixes your problem. (It's worth a shot anyway!)
If it doesn't, then we can maybe try the z/OS 2.1 technique. Otherwise it looks like we're stuck with the TK3 technique, and I'm going to have to wait for you to determine where specifically (which cyl/hd) the corruption is occurring on TK3.
Does that sound reasonable?
I want you to know I really appreciate your patience with me on this, Dave. I really want to nail this issue. I really do! Data corruption is not a good thing (duh!), but I can't do it without your help, so I really appreciate your continued patience with me, and hope that we can work together to figure this damn thing out.
Thanks!
Another quick question:
You mentioned that the corruption also occurs on TK3. Yes? My question to you then is, how may I visually see evidence of this corruption for myself on this TK3 system you sent me?
When I attach a second terminal to TK3, I notice that a TK3 logon screen appears. Only problem is, I don't know what userid and password to use to login with!
Then, once logged in, where do I go? What menu entries do I need to select to reach the screen where I can then actually see the corrupted directory entry? How can I prove to myself with my own eyes that the corruption has indeed occurred? That is to say, how can I manually verify the corruption for myself?
Note: I'm already familiar with z/OS's various TSO screens (I'm no expert, mind you, but I can get around okay!), but I'm unfamiliar with TK3's various TSO screens. I suspect they're quite a bit different. Yes?
After providing me with a userid and password that I can use to logon with, can you then walk me through the steps needed to reach the TSO screen where evidence of the corruption can easily (and obviously?) be seen?
Thanks!
The TK3 system is slightly broken but just enter HERC01 as user id and there is no need for password. A report showing the errors are a part of the job stream that encountered the error. I enclose an example further below.
Get into edit on HERC01.TEST.CNTL
(RFE option 2) and pick FMTVTOC
, my own VTOC formatting program, to get the CCHH extents of data sets on the BLDTGT
volume. It will format the VTOC for the volume selected.
Sample EDIT (option 2) input below:
OTHER DATA SET:
DATA SET NAME ===> 'herc01.test.cntl'
So, for example MVSSRC.BLD.LPALIB
has one and only extent. That is also in the enclosed report:
DS1EXT1...... 81 00 003C 0000 004F 000E.
Of course, you could do a LISTVTOC via IEHLIST sample in HERC01.TEST.CNT
' member IEHLIST
, which lists MVSSRC.BLD.NUCLEUS
, but change to LPALIB
if needed.
Notice cylinder/track numbers are decimal:
EXTENTS NO LOW(C-H) HIGH(C-H)
0 105 0 126 14
Output goes to the file allocated to device 00E.
Hope that helps. See "Some Sample Reports.txt" enclosed:
P.S. I am a rookie when it comes to git and tried to follow the instructions for build and failed off the bat. What/where/how do I use wget to get what I need to build - "wget https://github.com/wrljet/hercules-helper-windows/archive/master.zip". I see Linux examples but here again a rookie when it comes to Linux. (No Linux systems in this house either - no idea what to do with it.)
Reading instructions tells me I don't have much of anything on my Windows 11 system to do any of this - REXX or VS Studio or git. So, it isn't quite so simple to this rookie. Almost as complex as z/OS. Do you just have a windows version of your latest build I could just upload?
The TK3 system is slightly broken but just enter HERC01 as user id and there is no need for password. A report showing the errors are a part of the job stream that encountered the error. I enclose an example further below.
Thanks. I'll take a look.
P.S. I am a rookie when it comes to git and tried to follow the instructions for build and failed off the bat. . . . Reading instructions tells me I don't have much of anything on my Windows 11 system to do any of this - REXX or VS Studio or git. So, it isn't quite so simple to this rookie.
I was under the impression that Bill's "Hercules Helper" script was supposed to do all of that for you! I was under the impression that Hercules Helper would automatically install ALL of the needed pieces, including git
as well as Visual Studio itself and anything else too!
Bill? (@wrljet)
Can you help Dave with this?
In the mean time, I will upload a current Windows build of the latest development version of Hercules to my web site and send you the URL.
*What's your direct email address? Mine is "fish@softdevlabs.com"_`()`_**
*`()`** Fuck the spam harvesters! All my mail goes through gmail, which does a pretty darn good job of identifying and filtering out all the garbage, so I don't mind posting my email address in a public forum any damn time I want.
Get into edit on
HERC01.TEST.CNTL
(RFE option 2) and pickFMTVTOC
, my own VTOC formatting program, to get the CCHH extents of data sets on theBLDTGT
volume. It will format the VTOC for the volume selected.
Okay.................. But what am I looking for?
You have the exact same TK3 system I do. Yes? Where is the visual evidence of the corruption? What am I looking for?
You still haven't answered that.
I did some multiple attempts to make a short z/OS, with as little data to push around, but it was all futile. Only the real job stream does recreate the problem.
The program that is involved is purely the linkage editor (binder) PGM=IEWL
, either TK3 or 2.1. As I said before, TK3 symptoms and location are different than 2.1. The 2.1 error is quite obvious - writing to wrong CCHHR per trace. The TK3 is not so clear where things go wrong. But both are very consistent.
So, the output looks something like this from FMTVTOC:
Format 1 DSCB CCHHR=0000 0001 0A +0 DS1DSNAM..... MVSSRC.BLD.LPALIB +3B DS1NOEPV..... 01 <--- in one extent +69 DS1EXT1...... 81 00 003C 0000 004F 000E <--- that is the data sets only extent
And that is where the data set that got damaged is.
The damage I think is in the directory, but I haven't quite figured out where yet.
An analysis shows a number (210) of members whose alias exist but the real is no longer present. I am going to compare the two directories - TK3 and 2.1 to see if that helps identify where the problem is since the problem doesn't on 2.1 in the LPALIB data set.
*** WARNING *** AHLACFV is an alias but has no real entry *** WARNING *** AHLDMPMD is an alias but has no real entry *** WARNING *** AHLDSP is an alias but has no real entry
You can reach me at my gmail email address davekreiss with the normal gmail stuff after.
I'm still confused about this issue.
Let's step back and start over.
You created this issue because you wanted to learn how to do a cckd
trace. Correct?
Okay. So the next logical question is, WHY did you want to do a cckd
trace?
Answer: Because you claim corruption was occurring, and you wanted to catch it.
Okay. So the next logical question becomes, how did you discover this corruption? How did you notice it? How did you detect it?
THAT is the question I am now asking!
I am not interested in the traces you've done that you claim provides evidence that Hercules "is doing the wrong thing".
I am only interested at this point, in actually seeing actual hard evidence of the corruption. Traces are not hard evidence to me. Hercules's tracing could be faulty and misleading. Hercules's CCKD logic uses multiple threads to improve efficiency. Hercules also has special logic that, when it gets very busy (i.e. "is under stress"), it delays writing tracks to the actual dasd until later when it is not so busy. Instead, it keeps them in cache, so if the same track it updated again (or a track needs to be read), it can retrieve it from cache. And of course it also has logic that when its cache becomes full (or passes a certain threshold), it begins flushing its oldest cache entries out to disk to make room for new cache entries. It is very complex.
Having looked at many traces over the years (and you've seen them too!), one thing I've noticed is that it does not report which thread is doing the tracing. It might be that what you are seeing in your ckd traces is Hercules simply doing a stressed write. That "rogue" seek to cyl 31 head 0 that you saw could have been Hercules simply flushing its cache from an old write that was done long ago in the past. Maybe. I don't know. Because it's trace messages do not include the thread id.
THAT is why I cannot fully and completely trust a CCW trace as evidence of corruption occurring.
And that is why I want to see FOR MYSELF the actual corruption. I want to physically examine the actual record(s) on the dasd that are supposedly corrupted. I want to see it with my own eyes!
I want you to tell me how YOU know "it's" corrupted. (whatever "it" happens to be).
I want to know what a good, uncorrupted data block looks like, and what a corrupted data block looks like.
I want to know basically how you DISCOVERED the corruption, way back, long ago, that caused you to embark on this journey. I want you to tell me what it was that you saw, or noticed, or discovered that caused you to say to yourself: "What?! That's not right! That's wrong! That's been corrupted!"
I want you to tell me THAT. I want to see what YOU originally saw year(s) ago that told you that corruption was occurring.
THAT is what I am wanting to know today.
Do I need to examine each member of each dataset to verify "they look okay"? And then maybe see one of them "that looks wrong"? Do I need to do that? Because I will!
We have a utility called dasdcat
that can dump or display each member of a given dataset. It reads the dataset's directory to know where the member is on disk, and reads it and then displays or dumps it. If a directory entry for a given dataset is corrupted, I would expect it to fail. Yes?
Do you understand now what I am asking for?
I am looking for a way to detect the corruption that does NOT involve looking at a CCW trace. Or a GTF trace. Or ANY trace. Instead, I want to actually SEE the corruption for myself. I want to be able to detect it when it has occurred.
Can you help me with that?
Please?
You can reach me at my gmail email address davekreiss with the normal gmail stuff after.
Email sent with download URL for Hercules version 4.7.0.10982-SDL-DEV-g81a7bf7f.
Please give it a try.
If we're LUCKY, it just might fix your problem? <me: fingers crossed>
P.S. I am a rookie when it comes to git and tried to follow the instructions for build and failed off the bat. . . . Reading instructions tells me I don't have much of anything on my Windows 11 system to do any of this - REXX or VS Studio or git. So, it isn't quite so simple to this rookie.
I was under the impression that Bill's "Hercules Helper" script was supposed to do all of that for you! I was under the impression that Hercules Helper would automatically install ALL of the needed pieces, including
git
as well as Visual Studio itself and anything else too!Bill? (@wrljet)
Can you help Dave with this?
Sure
Dave will have to show me exactly what he tried to do.
Bill
The TK3 system is slightly broken but just enter HERC01 as user id and there is no need for password. A report showing the errors are a part of the job stream that encountered the error. I enclose an example further below.
Just my editorial comment... Stuff like this that might require running several or many times is much easier to work with if JCL can just be sent in through the card reader, and not involve TSO or 3270s in any way.
Bill
So, instructions first step says:
or, if you don't have git, or simply prefer: wget https://github.com/wrljet/hercules-helper-windows/archive/master.zip
No clue where to do this command. I tried a command prompt on my windows 11 system since it looked like a command but no luck. Not sure what to do with wget. Treat me like I am in kindergarten or maybe beginner 1st grade.
OK, you can also paste that URL into a web browser and let it download it for you.
BTW that wget
suggestion was taken from the *nix instructions, without thinking on my part.
I will correct/improve that first step.
Ok @wrljet I unzipped the mentioned zip file followed instructions and did a successful build. Thanks for assistance.
@Fish-Git Took your zipped Hercules 4.7 installed and ran both 2.1 and TK3 with similar results as related to 2.1: Damage to AHELP.
I also used the above Hercules build (which I suspect is identical to your zipped Hercules) with the same 2.1 and TK3 results.
Back to TK3 damage: First, a confession. I believe the "damage" was self-inflicted on TK3. I can find no sign of a directory block being written to the wrong CCHH as in the 2.1 situation.
However, the 2.1 damage is real. I have dumped the tracks (using DFDSS in the OS) in question, and they are indeed in the wrong extent as has been described. Not sure how to get a dump of a dasd image using any Hercules utility.
So how do I know the data (directory entry) is written to the wrong CCHH?
The AHELP data set is a DSORG=PO RECFM=FB LRECL=80 "text" data set which contains MVS help members. As is case with MVS 3.8 there is little to no extra directory data (user data). This data set directory starts at CCHH 001F 0000.
The AOSB3 data set is a DSORG=PO RECFM=U load library. Load modules contain much more directory "user data". A set of these members' directory entries for this library is written to the directory for AHELP. This is identified by the member names and the load library user data in the directory entries. This directory starts at CCHH 0062 0000.
Back to traces we have seen the define extent covers the AOSB3, but yet we see in CCW trace (even though as you say these trace entries aren't reliable) showing that I/O occurred to CCHH 001F 0000, and sure enough, a physical dump show these directory entries are there via a DFDSS dump track.
I thought I could reproduce this by simply running a link edit using the same input/output outside of the SMP job. Didn't produce the update to wrong CCHH. Maybe what I suspect goes back to your comment about caching and "multi tasking" going on in Hercules.
Using TK3 seems to be out of the question, as no reproduction seems to occur there. Anyway, that is where I am at this point. Hope this clarifies your questions. Let me know if you need more info and how we can approach reproduction and or diagnosing.
The AHELP data set is a DSORG=PO RECFM=FB LRECL=80 "text" data set which contains MVS help members. As is case with MVS 3.8 there is little to no extra directory data (user data).
The AOSB3 data set is a DSORG=PO RECFM=U load library. Load modules contain much more directory "user data".
Okay........ So what you are saying is (I want to understand this!), the individual directory entries for the two different types of datasets look very different from each other. That YOU can look at a hex dump of a 256-byte directory block, and easily tell whether it is a directory block for a "text"-type dataset containing "text"-type directory entries, and a 256-byte directory block for a "load library"-type dataset containing "load library"-type directory entries. Yes? Is that what you are saying? That the directory blocks and their individual directory entries within them, look different? One type contains more/different information than the other? Yes?
That if we do a hex dump of all of the 256-byte directory blocks for the "text"-type AHELP dataset after the bug/corruption has occurred, that you can tell which of those 256-byte directory blocks are damaged/wrong. Yes? That is what you are claiming, yes?
So okay then... This is how I would like to proceed:
dasdcat
of the z/OS 2.1's AHELP dataset to list all of its directory entries (just their names) BEFORE you run your job, so we can see what members are supposed to be in that dataset: dasdcat -i Q:/TK3/Dasd/BLDDLB.cckd32 MVSSRC.BLD.AHELP/? > dasdcat-before.log
dasdpdsu
of the z/OS 2.1's AHELP dataset to "unload" all of its members to an output folder (Windows directory) of your own choosing BEFORE you run your job, so we can see what each member looks like. This should verify that each of the dataset's directory entries are all intact/accurate: dasdpdsu Q:/TK3/Dasd/BLDDLB.cckd32 MVSSRC.BLD.AHELP ascii C:/ahelp/before > dasdpdsu-before.log
cckddiag
dump of the z/OS 2.1's AHELP dataset's directory blocks (cylinder 31 head 0, Yes?) BEFORE you run your job, so we can see what "good" directory blocks should look like. cckddiag -t -x -a 31 0 Q:/TK3/Dasd/BLDDLB.cckd32 > cckddiag-before.log
Then run your job stream that causes the corruption.
Then repeat steps 1-3 afterwards, but specifying different output log files for each of them, and a different output Windows folder name for step 2 too of course.
If I am understanding you correctly, then you should be able to point me to one of the AHELP dataset's directory blocks dumped by cckddiag
in step 3, and tell me:
"See this one right here, Fish? It's WRONG! THIS directory block has been corrupted! Do you see it?"
You should be able to do that, correct?
Futhermore, we should also be able to see that the list of the members in the AHELP dataset displayed by dasdcat
in step 1 has changed. That there are directory entries (member names) listed that weren't there before, and vice-versa: that some are missing. Yes?
And finally, we should also be able to see that the actual contents of some of the members has changed as well, as "unloaded" by dasdpdsu
in step 2. Yes?
I am currently working on the changes necessary to also report the "thread id" in all of Hercules's CCKD trace messages. Once I am done with that, I will then have you rebuild your Hercules so that you can do another CCW trace for me, but this time to a trace capture file that you can then send to me, which is much more debugging-friendly than the tracing we've been doing.
But don't bother with the traces yet. Wait until I make my changes first and you rebuild Hercules. I'll let you know when they're ready so you can do that. For now, just concentrate on doing the above for me please. I'd really appreciate it, Dave. I really, really want to see this corruption for myself!
I trust you that it's actually happening.
But I want to verify it. For myself. With my own eyes.
I am finding it incredibly hard to believe that such a serious bug in Hercules -- one that causes dasd corruption! -- could remain undetected for all these years. I would think other people would be complaining left and right. I mean, that code has been exercised (executed) literally many billions of times! Perhaps even trillions!
So I want proof. Hard proof. Undeniable proof.
Once I have that, then the detailed tracing to a trace-file that I will eventually have you do for me, should provide me with enough information that I should then (hopefully!) be able to manually reproduce the problem at will on my own system.
Once I'm able to do that, once I'm able to reproduce the problem on my own system, then finding and correcting the bug should hopefully be fairly simple and straightforward.
But that of course remains to be seen. ;-)
I really appreciate your patience with me, Dave. I promise you, we will get to the bottom of whatever the frick is going on.
Or I'll die trying. One or the other. ;-)
Not sure how to get a dump of a dasd image using any Hercules utility.
cckddiag
is your best bet. Refer to step 3 in my previous reply above.
So, I have run test as prescribed with additional information. After the first two sets of diagnostics, I shutdown the OS and then performed steps 1-3 again.
Contents of the above .zip file:
RES201.txt
is the OS configuration file.prt201.txt
contains diagnostic output written to SYSOUT=A and the system log.hercules.log
is the Hercules log.events.docx
is the sequence of events and diagnostics in time order.Let me know what else we need to do; I think we are getting a little closer to what's going on.
FYI: I'm seeing a possible hole (bug) in I/O queue processing that might explain things. Maybe. Am researching...
(EDIT: the below described bug was determined to not exist. See my followup comment further below.)
Mark (@dasdman), (and anyone/everyone else too!)
Please check me on this before I spend a lot of time and effort to try and create a test program to trigger it:
I believe I may have found a hole (bug) in Hercules's I/O request handling that could allow two separate I/O requests for the same device to execute simultaneously.
The condition under which it looks like it could occur is simply the guest issuing a subsequent SSCH
before the previous one, already underway (i.e. still running/active) has completed, such that we then have two I/O requests executing simultaneously on the same device, which needless to say is not good given that processing an I/O request necessarily involves updating fields in DEVBLK
, thereby causing the two I/Os to interfere with each other.
Here's how it looks like it can easily happen under the described conditions: *`()`**
SSCH
is issued and begins processing, eventually reaching the point in channel.c
where the device's "start" function is called to process the actual CCW request:
/*-------------------------------------------------------------------*/
/* B233 SSCH - Start Subchannel [S] */
/*-------------------------------------------------------------------*/
DEF_INST( start_subchannel )
{
...
/* Start the channel program and set the condition code */
regs->psw.cc = ARCH_DEP( startio )( regs, dev, &orb );
/*-------------------------------------------------------------------*/
/* START A CHANNEL PROGRAM */
/* This function is called by the SIO and SSCH instructions */
/*-------------------------------------------------------------------*/
/* Input */
/* dev -> Device control block */
/* orb -> Operation request block */
/* Output */
/* The I/O parameters are stored in the device block, and a */
/* thread is created to execute the CCW chain asynchronously. */
/* The return value is the condition code for the SIO or */
/* SSCH instruction. */
/* Note */
/* For S/370 SIO, only the protect key and CCW address are */
/* valid, all other ORB parameters are set to zero. */
/*-------------------------------------------------------------------*/
int
ARCH_DEP(startio) (REGS *regs, DEVBLK *dev, ORB *orb)
{
...
/* Return condition code 2 if device is busy */
#if defined( OPTION_SHARED_DEVICES )
if (unlikely((dev->busy && dev->shioactive == DEV_SYS_LOCAL)
|| dev->startpending))
#else
if (unlikely((dev->busy)
|| dev->startpending))
#endif
{
RELEASE_DEVLOCK( dev );
...
return 2;
}
...
/* Schedule the I/O for execution */
rc = schedule_ioq((sysblk.arch_mode == ARCH_370_IDX) ? regs : NULL,
dev);
/*-------------------------------------------------------------------*/
/* Execute a queued I/O */
/*-------------------------------------------------------------------*/
DLL_EXPORT void* device_thread( void* arg )
{
...
while ((dev = sysblk.ioq) && !sysblk.shutdown)
{
...
dev->tid = thread_id();
...
/* Execute requested CCW chain */
call_execute_ccw_chain( sysblk.arch_mode, dev );
...
dev->tid = 0;
}
// end while ((dev=sysblk.ioq) && !sysblk.shutdown)
/*-------------------------------------------------------------------*/
/* EXECUTE A CHANNEL PROGRAM */
/*-------------------------------------------------------------------*/
void*
ARCH_DEP(execute_ccw_chain) (void *arg)
{
...
OBTAIN_DEVLOCK( dev );
...
set_subchannel_busy(dev);
dev->startpending = 0;
...
/* Call the i/o start exit */
if (dev->hnd->start)
{
RELEASE_DEVLOCK( dev );
{
(dev->hnd->start) (dev);
}
OBTAIN_DEVLOCK( dev );
}
...
And there you have it!
Don't see it? Allow me to point it out to you...
The startio
function is never detecting that the device is busy and is thus always queuing the new I/O request for processing, and the device_thread
function is never checking to see if the device already has an active request still in progress before queuing the new request either, AND the execute_ccw_chain
function is never checking if the device is already busy either, before blindly calling set_subchannel_busy
to set the dev->busy
flag and begin processing the request! (causing it to eventually reach the (dev->hnd->start)(dev)
statement and call the device's CCW processing logic AGAIN, while it's still in the middle of processing (hasn't finished processing) its previous request yet!)
Oops! :)
And it's all caused by this bug right here. This is the primary culprit (although the other players are just as guilty in my opinion):
#if defined( OPTION_SHARED_DEVICES )
if (unlikely((dev->busy && dev->shioactive == DEV_SYS_LOCAL)
|| dev->startpending))
#else
if (unlikely((dev->busy)
|| dev->startpending))
#endif
The OPTION_SHARED_DEVICES
build option is always enabled by default in featall.h
:
#if defined( OPTION_SHARED_DEVICES ) && defined( OPTION_NO_SHARED_DEVICES )
#error Either OPTION_SHARED_DEVICES or OPTION_NO_SHARED_DEVICES must be specified, not both
#elif !defined( OPTION_SHARED_DEVICES ) && !defined( OPTION_NO_SHARED_DEVICES )
// Neither is #defined. Use default settings.
#define OPTION_SHARED_DEVICES
#define FBA_SHARED
#elif defined( OPTION_NO_SHARED_DEVICES )
#undef OPTION_SHARED_DEVICES
#undef FBA_SHARED
#elif defined( OPTION_SHARED_DEVICES )
// Leave FBA_SHARED alone, either #defined or #undefined, as desired.
#endif // OPTION_SHARED_DEVICES
Thus, the test for device busy in the startio
function that actually gets built is:
if (unlikely((dev->busy && dev->shioactive == DEV_SYS_LOCAL)
|| dev->startpending))
And due to the if
statement using &&
to combine the two conditions, both conditions must of course be true in order for the code block that returns Condition Code 2 to be taken, and it never is. The condition will always be false whenever there is no shared device I/O active.
dev->shioactive
is only ever set to DEV_SYS_LOCAL
(0xffff
) whenever a shared I/O is active on the device. If there isn't any shared I/O active on the device, then dev->shioactive
is always going to be DEV_SYS_NONE
(i.e. 0).
Like I said: Oops! :)
Anyway, that's what I'm seeing. I have not run any tests yet to see whether this condition can actually occur or not. I might just be seeing things. Which is why I am asking you (or anyone else!) to please confirm my suspicions. Or deny them. Whatever.
I would like to create a test program to try and trigger this condition in order to conclusively confirm my findings and verify that the bug does indeed exist (and I will probably get around to eventually doing that), but as I said in my opening statement, I would like confirmation of my findings first. I don't want to spend the time and effort to write such a program if I don't have to. I would like a second pair of eyes to confirm what I'm seeing first.
But if you agree with what I am seeing (if you are seeing the same thing), then I most definitely will take the time to create such a test program so we can add it to our suite of Hercules QA tests.
That's it. Let me know if you're seeing the same thing as I'm seeing.
Thanks!
*`()** _The guest issuing two or more back-to-back
SSCH` (Start Subchannel) requests without waiting for the previous one to complete beforehand._
(EDIT: the bug described in the above GitHub comment was determined to not exist. See my followup comment further below.)
I know I am getting into this without knowing much about Hercules internals but 1) Looking at one of the CCW trace I see in pairs 6,954 "HHC00396I 0:3004 start i/o" eventually followed by "HHC00396I 0:3004 end". That is there is no trace entry of a start I/O issued before the previous start I/O end is traced. In other words, at no time in trace is there two start I/O active at one time. Ok the time stamps don't have enough resolution to determine order and physical order of actual trace records can be recorded out of order if there are multiple threads tracing, so CCW trace is probably not reliable for verification that there are no concurrent start I/O occurring. 2) As long as the UCB is marked busy in z/OS subsequent I/O requests are queued in the operating system off the UCB awaiting completion of the current I/O of that devices UCB. So, if the I/O is still active in Hercules but the OS is told I/O has completed by Hercules when it hasn't the OS will start the next queued I/O. 3) In the trace there are no define extent for the range of tracks updated incorrectly. So those tracks should be protected and should not have had any I/O to them.
2. As long as the UCB is marked busy in z/OS subsequent I/O requests are queued in the operating system off the UCB awaiting completion of the current I/O of that devices UCB. So, if the I/O is still active in Hercules but the OS is told I/O has completed by Hercules when it hasn't the OS will start the next queued I/O.
True. And I doubt that is happening in your case. So the unconfirmed bug that I believe I may have found in Hercules is likely not your problem. I readily admit that.
But it is a legitimate bug in Hercules that needs to be fixed IMO. It might not be your bug (the bug that is causing your corruption problem), I don't know. If an operating system does issue two back-to-back I/O requests (SSCH) to a device (which SHOULD cause the second one to complete with cc=2 busy if the first one is still in progress, but isn't), then the bug I described will occur. So it is a bug.
You need to keep in mind Dave that I'm only getting started on my bug hunt. During bug hunts, I tend to question everything. The bugs I find (if any) during these bug hunts may not necessarily have anything to do with the specific bug being hunted, but they are bugs nonetheless that need to be fixed when discovered.
So, I have run test as prescribed with additional information. After the first two sets of diagnostics, I shutdown the OS and then performed steps 1-3 again.
Contents of the above .zip file:
RES201.txt
is the OS configuration file.prt201.txt
contains diagnostic output written to SYSOUT=A and the system log.hercules.log
is the Hercules log.- Summary of
events.docx
is the sequence of events and diagnostics in time order.
You did things wrong. You can't run any of the utilities (dasdcat, dasdpdsu or cckddiag) while Hercules is still up and running and has the file opened for writing. Notice the errors that were reported by both dasdcat
and dasdpdsu
in the "before" and "after" reports:
HHC00364W 0:0000 CCKD file c:/r/BLDdlb.192: forcing check level 1 HHC00007I Previous message from function 'cckd_chkdsk' at cckdutil.c(1339) HHC00368W 0:0000 CCKD file c:/r/BLDdlb.192: free space errors detected HHC00007I Previous message from function 'cckd_chkdsk' at cckdutil.c(1774) HHC00376W 0:0000 CCKD file c:/r/BLDdlb.192: free space not rebuilt, file opened read-only HHC00007I Previous message from function 'cckd_chkdsk' at cckdutil.c(2526)
This is because Hercules was still up and running, and updated data was still sitting in Hercules's cache and hadn't been flushed out to the actual physical dasd yet. That only happens once Hercules is completely shutdown. What dasdcat
and dasdpdsu
both "saw" (and reported) was therefore NOT the dasd's actual true contents.
But the good news is yes I can see the damage/curruption.
Let me know what else we need to do; I think we are getting a little closer to what's going on.
I know we're getting close!
In fact, I think I may have already found the problem! (I think)
I am going to be building a test version of Hercules with hopefully the fix in it that I want you to try for me. I'll let you know when it's ready. In addition to the fix, I want to add some debugging code to it too before sending it to you, just in case I'm wrong about the bug. Expect an email from me with the URL of the (hopefully!) fixed version and instructions on how to run it within the next day or so.
P.S. The bug (if I'm right about it) has been in Hercules for about the past 5 years or so. It was first introduced (I think) in version 4.2.
So if you try version 4.1 and it works, but version 4.2 fails, then that's just one more tiny bit of evidence that what I found is very likely the problem.
P.P.S. All releases of SDL Hercules Hyperion can be found here:
They're listed in reverse order, from newest to oldest, with the oldest (version 4.1) at the bottom.
@davekreiss
I sent you an email. Did you get it?
Yes and I am in process of running tests between work and home interference. DaveOn Jul 13, 2023, at 4:30 PM, Fish-Git @.***> wrote: @davekreiss I sent you an email. Did you get it?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
I know we're getting close!
In fact, I think I may have already found the problem! (I think)
P.S. The bug (if I'm right about it) has been in Hercules for about the past 5 years or so. It was first introduced (I think) in version 4.2.
Bug found and squashed! :))
Fixed by commit bd8362828d379105f7629c3b97df000dd17d1071.
Closing issue.
Regarding my earlier comment about a possible Hercules IOQ processing/scheduling bug...
I was wrong. Testing has revealed that all is well. There is no bug.
The channel was actually doing some things I hadn't realized the first time I looked at it.
I apologize to Mark (@dasdman) and the rest of the team for unintentionally Crying Wolf.
ALL IS WELL. THERE IS NO BUG.
We return you now to your regularly scheduled program...
I am fighting a I/O problem. Running z/OS 2.1 under Hercules version 4.6.0.10941-SDL-g65c97fd6 on Windows 10 Home 64 bit 19045.3086 (though Hercules 4.5 has same difficulties).
Introduction
The program in question has a WTOR to stop program at point prior to problem.
At that point I start an GTF IO trace for the device involved.
Then I enter the Hercules command
cckd trace=200000,debug=1
. (I've also tried 190000 and 100000 and debug 1/0 in many different combinations)Then respond to the WTOR which lets the program run.
When the program hits the end of the area in code after where the IO issue has occurred another WTOR is issued.
I then stop GTF and enter Hercules command
k
to dump/display the cckd trace table.One time I received a trace that covered half the time between the WTORs, but in the current instance there is no trace output at all covering the WTORs period.
As you will see in the enclosed log, sometimes I get a trace and sometimes I get nothing. (I started
cckd
trace several times and tried manyk
commands.)So why all this process to get the trace?
The I/O problem I am trying to solve involves some form of corruption of a z/OS data set. The program being debugged is a MVS 3.8 version of SMP run assembling 1230 programs followed by SMP then linking those programs to distribution libraries. I have front ended the linkage editor with my own program which determines which link is the flawed one and issue the first WTOR upon detecting the flawed link edit. When that link edit is complete the second WTOR is issued.
The corruption is to a data set that is:
not a load library so shouldn't be referenced as part of link edit output, and:
though there is a DD statement for that data set in the JCL, SMP doesn't invoke link edit with that data set as any data set as the output data set.
Interestingly, the corruption is to a non-load library data set - AHELP, a simple 80 byte fixed block PDS containing HELP info. The corruption occurs in the directory portion of the AHELP data set, and is very consistent through all runs. It is reproducible and always occurs between the WTORs.
I have verified the GTF trace contains no define extent for where the directory blocks being overwritten. More info as to why and what was corrupted can be shown.
There are several interesting variations to the corruption:
it didn't happen years ago (can't remember when or versions of Hercules it was successful), and:
running the same process on MVS 3.8 (both TK3 and TK4-) the corruption occurs on other data sets. (complex Hercules versions here as well, and I haven't yet done testing with older Hercules versions.)
Those seem corrupted data sets are consistent, but I have concentrated on the z/OS 2.1 corruption since it easier to research using its various diagnostic tools. Also, I can't fall back to some ancient Hercules version as I am now running with CCKD64 disk, though the disk in question with the corruption is not a CCKD64 disk.
For now, all I wanted to do was correlate the
cckd
trace with the GTF trace. I wrote a program which does that, but only once did I get a matching GTF andcckd
trace. Unluckily thecckd
that trace didn't include the complete time frame between the two WTORs.What is enclosed:
Hercules log of latest attempted tracing:
Problem Summary:
So, there are two problems:
the
cckd
trace commandcorruption, but I have to this point been unable to pinpoint who is responsible: a. Hercules, or: b. whatever software running on z/OS (and MVS 3.8).
This story is a lot longer but for now I'd like to be able to get a complete
cckd
trace between the two WTORs working so I can do further analysis. I did read about the instruction trace to a file and was hoping for the same feature withcckd
but help doesn't doc that feature.