SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

HHC00310E and HHC00342E on CCKD64 disk #490

Closed davekreiss closed 1 year ago

davekreiss commented 2 years ago

Note: this issue is related to issues #187 "Hercules has crashed" and #194 "Hercules has crashed - continuation of #187", and is closely related to issue #504 "Should CCKD Garbage Collector be disabled (and cckdcomp discouraged) by default?"


Not sure what all you might need to diagnose this issue. I've included the Hercules log.

This volume had previously received the same set of errors (HHC00310E and HHC00342E), so I rebuilt the disk from scratch. The original disk was converted from non-CCKD64 to CCKD64. I recreated the new disk by creating a empty CCKD64 volume, and then used dump and restore to rebuild it.

Running cckdcdsk64 -3 -f repairs free space and disk appears non-damaged, but these errors (HHC00310E and HHC00342E) occur again.

The size of the disk is 4.92 GB. No shadows. Using latest downloadable Hercules and running on Windows 10.

Let me know what else you might need to diagnose this.

Fish-Git commented 2 years ago

Hi Dave!

May I see your Hercules configuration file please?

May I see the dasdinit64 ... that you did?

May I see the cckdcdsk64 -3 -f that you did? Was Hercules shutdown when you ran it?

Doing a dasdinit64 to create a fresh empty dasd image file most definitely should not cause cckdcdsk64 to report any errors. There should be ZERO errors on a freshly initialized dasd image file. There should also not be any errors (free space or otherwise; i.e. ZERO errors) after performing a restore either (depending of course on what type of backup/restore you are talking about).

When you say "... and then used dump and restore to rebuild it", what does that mean? What program did you use to backup your dasd? What program did you use to restore it?

Did you first repair your "broken" dasd image file (via cckdcdsk64 -3) before doing your backup? Did you do a cckdcdsk64 -3 -ro (read-only; no repairs) after doing your restore to make sure the dasd was still clean/undamaged? Doing a restore should not cause damage to occur.

What you are reporting is highly unusual. If you repaired your dasd before doing your backup (presumably while Hercules was down and not running, yes?) and then restored that data to a freshly initialized (dasdinit64) dasd image, there should not be any damage. If cckdcdsk64 reports errors after doing that, then something is very wrong somewhere. If cckdcdsk64 reports errors after rebuilding (restoring) data to a freshly dasdinit64'ed dasd image file, then Hercules is seriously broken. There is a serious bug somewhere in Hercules if what you are claiming is true.

Please do a cckdmap of your bad dasd and post the results.

(and post your Hercules configuration file too)

(and your dasdinit64 and cckdcdsk64 too please)

Thanks.

davekreiss commented 2 years ago

I think I enumerated what you wanted and included them in a 7 zip file. Let me know if you need more or I missed something. Thanks for responding.

1) Config - RES201.TXT 2) dasdinit64 - Don't recall exactly but remembered using -z 3390-17. I had saved off the result and included the resulting disk - Empty-3390-27 in 7zp file 3) cckdcdcdsk64 - I copied the unaltered mvsbld64 disk first then did the cckdcdcdsk64 which is cckdcdcsk64.txt - Yes OS is shutdown and Hercules "DOS" window is ended via QUIT. 4) Restore process: First shutdown OS. Ran cckdcdcdsk64 then reipled. (no -ro was run first) Then I cleared the old disk of work files (cuts size significantly) and ran DFDSS DUMP to AWS from the repaired disk. Then shutdown OS. Switched disk in config to new empty disk. IPLed OS - did ICKDSF INIT and did a DFDSS restore from AWS dump above. 5) cckdmap is cckdmap.txt against the original but I still have a copy before this step. 6) After the cckdcdcdsk64 I reIPLed and included a map of that volume not sure if that is of interest (using a program I wrote years ago). MVSBLD64 map.txt

So after the restore I re-IPLed after a clean shutdown of Hercules. Then IPLed again and ran a series of batch MVS 3.8 SMP jobs (runs for almost 5 hours with 35 million I/Os) that creates a large number of files on this new disk. At some point that process is done and after some idle or mostly idle time these errors started appearing with little to no activity against the newly created disk. I suspect this is recreatable and can attempt to do so if you want me to with each step clearly documented.

On Wed, Jun 22, 2022 at 3:06 AM Fish-Git @.***> wrote:

Hi Dave!

May I see your Hercules configuration file please?

May I see the dasdinit64 ... that you did?

May I see the cckdcdsk64 -3 -f that you did? Was Hercules shutdown when you ran it?

Doing a dasdinit64 to create a fresh empty dasd image file most definitely should not cause cckdcdsk64 to report any errors. There should be ZERO errors on a freshly initialized dasd image file. There should also not be any errors (free space or otherwise; i.e. ZERO errors) after performing a restore either (depending of course on what type of backup/restore you are talking about).

When you say "... and then used dump and restore to rebuild it", what does that mean? What program did you use to backup your dasd? What program did you use to restore it?

Did you first repair your "broken" dasd image file (via cckdcdsk64 -3) before doing your backup? Did you do a cckdcdsk64 -3 -ro (read-only; no repairs) after doing your restore to make sure the dasd was still clean/undamaged? Doing a restore should not cause damage to occur.

What you are reporting is highly unusual. If you repaired your dasd before doing your backup (presumably while Hercules was down and not running, yes?) and then restored that data to a freshly initialized (dasdinit64) dasd image, there should not be any damage. If cckdcdsk64 reports errors after doing that, then something is very wrong somewhere. If cckdcdsk64 reports errors after rebuilding (restoring) data to a freshly dasdinit64'ed dasd image file, then Hercules is seriously broken. There is a serious bug somewhere in Hercules if what you are claiming is true.

Please do a cckdmap of your bad dasd and post the results.

(and post your Hercules configuration file too)

(and your dasdinit64 and cckdcdsk64 too please)

Thanks.

— Reply to this email directly, view it on GitHub https://github.com/SDL-Hercules-390/hyperion/issues/490#issuecomment-1162784480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7FTRFY6F7H5TBOPNI7TDLVQLCOVANCNFSM5ZOQ3FGQ . You are receiving this because you authored the thread.Message ID: @.***>

Fish-Git commented 2 years ago

I think I enumerated what you wanted and included them in a 7 zip file.

Your reply was via email. There is no attachment. Please do not reply via email. Please reply directly via the GitHub Issues web page instead (i.e. https://github.com/SDL-Hercules-390/hyperion/issues/490). Thanks.

Let me know if you need more or I missed something. Thanks for responding.

See above.

See also:

https://github.com/SDL-Hercules-390/hyperion/issues/407#issuecomment-892814212= https://github.com/SDL-Hercules-390/hyperion/issues/305#issuecomment-636368013= https://github.com/SDL-Hercules-390/hyperion/issues/259#issuecomment-538589291=

Thanks.

Fish-Git commented 2 years ago

I believe the following statement might be the key clue as to the cause of the problem:

... and ran a series of batch MVS 3.8 SMP jobs (runs for almost 5 hours with 35 million I/Os) that creates a large number of files on this new disk. At some point that process is done and after some idle or mostly idle time these errors started appearing with little to no activity against the newly created disk.

I suspect that because you are creating a large number of files on this new disk (implying a lot of data blocks on a lot of tracks are being constantly rewritten/updated), there are likely a lot of cckd free spaces being created, thereby keeping the garbage collector very, very busy trying to consolidate all of this new free space, and it is somewhere during this garbage collection effort that something goes wrong. Precisely what goes wrong I do not know yet. Perhaps a field overflows. Perhaps a race condition destroys a field. I just don't know. But something apparently causes the garbage collector to go haywire.

It will take some effort but I am going to try to write a test program to try and recreate such a scenario in the hope of possibly being able to recreate the problem. I'm not sure how well the original author stress tested his garbage collection design. He may have tested it for sanity, but I'm wondering how well he stress tested it (or if if he stress tested it!). It may well have a design flaw that only occurs under very heavy load. We'll see.

In the mean time, yes, please do try and see if you can reliably recreate this problem, and then provide instructions on how to do so (keeping in mind that my z/OS knowledge/skills are very weak to non-existent!).

Thanks.

p.s. I still need the previously mentioned items. Thanks.

davekreiss commented 2 years ago

Ok, here is a zip file of the above requested info:

davekreiss commented 2 years ago

Reproduced the error after about 3 days of trying. Took a dump of Hercules as soon as I noticed the errors, but it is 12G compressed to 600M with 7 zip. Not sure if the dump would help or how to get it to you. At the time nothing had been running in OS for some time. Hercules log and cckdmap of volume with errors in enclosed zip.

Fish-Git commented 2 years ago

STATUS UPDATE:

Good news and bad news:

The bad news is I still haven't found the bug yet.  :(

CCKD code is quite complicated, so I'm having trouble clearly seeing how the garbage collection thread interacts with the writer thread, which is where I believe the problem lies. Because there were so many updates to so many tracks, there are now many, many free spaces sprinkled all throughout the dasd in question, which the garbage collector must now deal with. As a result, garbage collection for the given dasd is taking much longer than usual. It has a lot of tracks and associated L2 tables to move around.

As each are moved, new free space is created, which must then be consolidated, and it is during this moving of tracks+L2 around that I believe the problem occurs.

What I suspect is happening (unconfirmed!) is, the writer thread, in its need to find some available free space to write its updated track to, writes its track to a free space that the garbage collection thread happened to be in the middle of (or perhaps about to encounter) updating, thereby confusing the garbage collector. That is to say, the writer thread (that processes I/O requests from the guest, i.e. z/OS), writes its updated track to what it believes to be (and to what probably actually is!) a valid available free space, and then marks the old location as now being available free space. BUT... it does this while the garbage collector is in the middle of examining/considering that particular free space as a possible location in which to move its own data.

And whenever this condition occurs, it (the garbage collector) notices what it thought was a free space, is not actually a free space (because the writer thread consumed it before it could consume it). It's basically a race condition (I think) between the writer thread and the garbage collector thread choosing, by coincidence, the exact same free space. Whoever gets there and consumes it first, wins.

The GOOD news is, as best as I can tell, the condition appears to be completely benign. I'm pretty sure no file damage actually occurs as a result of this collision (race) between the two threads. Due to the locking involved (which is still somewhat unclear to me, so I am not 100% sure yet), the free spaces are properly updated in such a way as to make it impossible(?) (I think!) for the garbage collector to write its track+L2 data (that it was trying to move) to the same place where the writer thread wrote its updated track (and vice-versa too: it appears to be impossible(?) (I think! I'm still looking at it!) for the writer thread to write its updated track to the same place that the garbage collector moved (or was wanting to move) its track+l2 data to).

I say this because the moment the garbage collector detects this condition, it immediately aborts without writing anything. That is to say, when the condition is detected, it issues the error messages you are seeing and then exits. It immediately stops its garbage collection efforts for that particular dasd image (and moves on to the next dasd image file).

So the problem, at this point in time, appears to be mostly an annoyance due to the many scary messages that are issued whenever the condition is detected. And this condition will continue to be detected over and over again too, until such time as the garbage collector can finish its job for the particular dasd in question, which it of course will never be able to do, since, as soon as it starts, it immediately re-discovers the exact same condition again! That is to say, once the condition occurs for a given dasd, it then prevents the garbage collector from ever making any progress from then on for that particular dasd. Each subsequent attempt simply causes yet another immediate abort. (For that particular dasd.)

Sooo.... What does all this mean?

First, as I said, it appears to be benign (but nevertheless understandably quite annoying!). So I believe, for now, you can safely ignore the problem. As I said, as far as I can tell, no actual file damage occurs. Your dasd should still be sane. You can confirm this by running cckdcdsk64 after shutting down your guest and Hercules too (because cckdcdsk64 is an offline tool designed to be run when Hercules is not running), -or- by issuing the sfk Hercules command while Hercules is up and running. (Personally, I recommend running the offline cckdcdsk64 utility while Hercules is down.)

I'll continue digging to try and find precisely where/how this condition occurs so code can be added to prevent it from occurring. Until then however, there is something you can do to completely eliminate (or (slightly?) reduce the likelihood of) the problem from occurring, but each has their drawbacks:

  1. You can add a cckd freepend=4,gcint=60,trace=100000 statement to your configuration file to try and reduce the likelihood of the problem occurring. The problem with this approach is the problem could still occur, and when it does occur, many more messages will be printed (due to the trace=100000 option, which I need you to use so I can more clearly see WTF is going on!).

  2. You can add a cckd gcint=0 statement to your configuration file to completely disable garbage collection altogether. This is what I recommend you do for the time being.

This technique does however incur a small amount of risk. Because we are completely disabling all garbage collection, all of your compressed dasds will become more fragmented over time since free space is never being removed. Over time, your dasds will gradually grow larger and larger. How much larger is difficult to say.

The good news is however, you can easily prevent your dasds from growing larger by simply manually running convto64 (or dasdcopy64) whenever needed while Hercules is down and not running. So you can make it a temporary habit of, say, once per week, simply running either convto64 or dasdcopy64 (convto64 recommended) on all of your dasd images (or shadow files, as the case may be) before starting Hercules. This will remove all free space from the file (essentially "defragmenting" it) thereby freeing up disk space.

I'll keep this Issue open until I find the bug and fix it or die trying.  ;-)

Hope that helps!

Fish-Git commented 2 years ago

Reproduced the error after about 3 days of trying. Took a dump... [...] Hercules log and cckdmap of volume with errors in enclosed zip.

Unfortunately that does not help me much. I need to see a much larger set of cckd trace statements.

If you don't mind, I'd appreciate it if you could do the following for me:

Add the statement cckd trace=100000 to your configuration file, and when the problem triggers (i.e. when you start to see a sudden FLOOD of messages being issued), enter the command cckd trace=0. This should immediately stop the flood of messages to prevent your log file from filling up your hard drive. Then zip and post the resulting Hercules log file. That will help me the most.

I'm only interested in seeing the cckd trace messages that are issued on the very first occurrence of the problem. ALL of the trace messages. (Thus the reason for the cckd trace=100000 option.) Unfortunately however, there is currently no way to say: "Dump me the complete cckd trace table on the first occurrence of this problem but then stop!". Thus you will have to manually request that via the second manually entered cckd command.

Does that make sense? Do you understand what I would like you to do?

  1. Use cckd trace=100000 to create a large cckd trace table (so I can see everything leading up to the problem), and then:
  2. Manually enter the cckd trace=0 command to quickly turn off cckd trace message as soon as you notice the problem has occurred.

If you would do that for me, I might be able to better "see" how/where the problem is occurring (or at least see the sequence of events that lead up to the problem triggering).

If it's too much trouble, I understand. But it would really help me a LOT if you could do that for me please.

Thanks!

_(I'm still having trouble creating a test to reproduce the problem for myself, but since we know YOU are able to reproduce the problem on your system (albeit with a small bit of effort), it would be easier for you to do it for me rather than me try to reproduce it for myself. Thanks for understanding.)_

davekreiss commented 2 years ago

Ok I have your instructions and will follow them.

Let me add a task I do once in a while that seems to shrink my dasd images. I defrag the volume making the OS view of free space into few extents (usually 1 or two). Then I run a program that eventually allocates all free space in multiple data sets writing EOF markers on each track allocated. After that is done for all free space data sets on volume are deleted. OS/Hercules are shut down and cckd compress utility is run. I do this rarely now that I am on 64-bit dasd files but did do so recently.

Fish-Git commented 2 years ago

@davekreiss

Ok I have your instructions and will follow them.

Dave: any new information to report on this issue?

Let me add a task I do once in a while that seems to shrink my dasd images ... ... are shut down and cckd compress utility is run. I do this rarely now that I am on 64-bit dasd files but did do so recently.

That's fine. I see no problem with that procedure. Personally I would use convto64 instead (not cckdcomp), but that's a personal choice. To each their own.

davekreiss commented 2 years ago

I ran for days running over and over again the 18-hour job streams with the trace "cckd trace=100000" with never a hit. Tried again for days without the trace and still no hit. Frustrated and out of town I stopped trying. I have no idea what might be done to reproduce this. I guess I will just have to wait for this random event to reoccur.

Fish-Git commented 2 years ago

Oh well. I guess I'll go ahead and close this issue then. You can re-open it again if it ever occurs again. Thanks.

jreginatobr commented 2 years ago

I was receiving the messages below at 10 second intervals during many days (215 MB of log).

11:43:51 HHC00310E 0:2103 CCKD file[0] d:/zos/devices/herc2/fds002.dev9: invalid trk hdr trk -1 buf 000000000B2DFB00:7e0bd08200
11:43:51 HHC00007I Previous message from function 'cckd64_cchh' at cckddasd64.c(3039)
11:43:51 HHC00399I CCKD file: internal cckd trace
11:43:51 HHC00398I 11:43:51.538971 0:2101 cckd_read_l2ent(3260): file[0] l2[28,200] trk[7368] read_l2ent 0x2505c7 335 335
11:43:51 HHC00398I 11:43:51.538977 0:2101 cckd_gc_percolate(5399): gcperc move trk 7368 at pos 0x00000000002505c7 len 335
11:43:51 HHC00398I 11:43:51.538984 0:2101 cckd_write_trkimg(3388): file[0] trk[7368] write_trkimg len 335 buf 000000000B2E329C:0101eb0003
11:43:51 HHC00398I 11:43:51.538991 0:2101 cckd_read_l2(3001): file[0] read_l2 28 active 0 28 289
11:43:51 HHC00398I 11:43:51.538996 0:2101 cckd_write_trkimg(3403): file[0] trk[7368] write_trkimg oldl2 0x2505c7 335 335
11:43:51 HHC00398I 11:43:51.539002 0:2101 cckd_get_space(2188): get_space len 335 largest 49277 flags 0x02
11:43:51 HHC00398I 11:43:51.539008 0:2101 cckd_get_space(2299): get_space found 0x00000000004baa17 len 335 size 335
11:43:51 HHC00398I 11:43:51.539015 0:2101 cckd_write(665): file[0] fd[8] write, off 0x00000000004baa17 len 335
11:43:51 HHC00398I 11:43:51.539025 0:2101 cckd_write_l2ent(3299): file[0] l2[28,200] trk[7368] write_l2ent 0x4baa17 335 335
11:43:51 HHC00398I 11:43:51.539031 0:2101 cckd_write(665): file[0] fd[8] write, off 0x000000000000ed50 len 8
11:43:51 HHC00398I 11:43:51.539043 0:2101 cckd_rel_space(2330): rel_space offset 0x00000000002505c7 len 335 size 335
11:43:51 HHC00398I 11:43:51.539049 0:2101 cckd_read_l2ent(3242): file[0] l2[6,208] trk[1744] read_l2ent 0x3710
11:43:51 HHC00398I 11:43:51.539055 0:2101 cckd_read_l2(3001): file[0] read_l2 6 active 0 28 289
11:43:51 HHC00398I 11:43:51.539063 0:2101 cckd_read_l2(3020): l2[0,6] cache[266] hit
11:43:51 HHC00398I 11:43:51.539069 0:2101 cckd_read_l2ent(3260): file[0] l2[6,208] trk[1744] read_l2ent 0x250716 332 332
11:43:51 HHC00398I 11:43:51.539075 0:2101 cckd_gc_percolate(5399): gcperc move trk 1744 at pos 0x0000000000250716 len 332
11:43:51 HHC00398I 11:43:51.539083 0:2101 cckd_write_trkimg(3388): file[0] trk[1744] write_trkimg len 332 buf 000000000B2E33EB:0100740004
11:43:51 HHC00398I 11:43:51.539089 0:2101 cckd_read_l2(3001): file[0] read_l2 6 active 0 6 266
11:43:51 HHC00398I 11:43:51.539094 0:2101 cckd_write_trkimg(3403): file[0] trk[1744] write_trkimg oldl2 0x250716 332 332
11:43:51 HHC00398I 11:43:51.539100 0:2101 cckd_get_space(2188): get_space len 332 largest 49277 flags 0x02
11:43:51 HHC00398I 11:43:51.539106 0:2101 cckd_get_space(2299): get_space found 0x00000000004bab66 len 332 size 332
11:43:51 HHC00398I 11:43:51.539113 0:2101 cckd_write(665): file[0] fd[8] write, off 0x00000000004bab66 len 332
11:43:51 HHC00398I 11:43:51.539123 0:2101 cckd_write_l2ent(3299): file[0] l2[6,208] trk[1744] write_l2ent 0x4bab66 332 332
11:43:51 HHC00398I 11:43:51.539129 0:2101 cckd_write(665): file[0] fd[8] write, off 0x0000000000003d90 len 8
11:43:51 HHC00398I 11:43:51.539139 0:2101 cckd_rel_space(2330): rel_space offset 0x0000000000250716 len 332 size 332
11:43:51 HHC00398I 11:43:51.539146 0:2101 cckd_read_l2ent(3242): file[0] l2[23,52] trk[5940] read_l2ent 0xbf10
11:43:51 HHC00398I 11:43:51.539151 0:2101 cckd_read_l2(3001): file[0] read_l2 23 active 0 6 266
11:43:51 HHC00398I 11:43:51.539159 0:2101 cckd_read_l2(3020): l2[0,23] cache[252] hit
11:43:51 HHC00398I 11:43:51.539165 0:2101 cckd_read_l2ent(3260): file[0] l2[23,52] trk[5940] read_l2ent 0x250862 331 331
11:43:51 HHC00398I 11:43:51.539171 0:2101 cckd_gc_percolate(5399): gcperc move trk 5940 at pos 0x0000000000250862 len 331
11:43:51 HHC00398I 11:43:51.539179 0:2101 cckd_write_trkimg(3388): file[0] trk[5940] write_trkimg len 331 buf 000000000B2E3537:01018c0000
11:43:51 HHC00398I 11:43:51.539185 0:2101 cckd_read_l2(3001): file[0] read_l2 23 active 0 23 252
11:43:51 HHC00398I 11:43:51.539191 0:2101 cckd_write_trkimg(3403): file[0] trk[5940] write_trkimg oldl2 0x250862 331 331
11:43:51 HHC00398I 11:43:51.539196 0:2101 cckd_get_space(2188): get_space len 331 largest 49277 flags 0x02
11:43:51 HHC00398I 11:43:51.539202 0:2101 cckd_get_space(2299): get_space found 0x000000000054b263 len 331 size 331
11:43:51 HHC00398I 11:43:51.539209 0:2101 cckd_write(665): file[0] fd[8] write, off 0x000000000054b263 len 331
11:43:51 HHC00398I 11:43:51.539219 0:2101 cckd_write_l2ent(3299): file[0] l2[23,52] trk[5940] write_l2ent 0x54b263 331 331
11:43:51 HHC00398I 11:43:51.539225 0:2101 cckd_write(665): file[0] fd[8] write, off 0x000000000000c0b0 len 8
11:43:51 HHC00398I 11:43:51.539235 0:2101 cckd_rel_space(2330): rel_space offset 0x0000000000250862 len 331 size 331
11:43:51 HHC00398I 11:43:51.539242 0:2101 cckd_read_l2ent(3242): file[0] l2[6,124] trk[1660] read_l2ent 0x3710
11:43:51 HHC00398I 11:43:51.539247 0:2101 cckd_read_l2(3001): file[0] read_l2 6 active 0 23 252
11:43:51 HHC00398I 11:43:51.539255 0:2101 cckd_read_l2(3020): l2[0,6] cache[266] hit
11:43:51 HHC00398I 11:43:51.539261 0:2101 cckd_read_l2ent(3260): file[0] l2[6,124] trk[1660] read_l2ent 0x2509ad 151 151
11:43:51 HHC00398I 11:43:51.539267 0:2101 cckd_gc_percolate(5399): gcperc move trk 1660 at pos 0x00000000002509ad len 151
11:43:51 HHC00398I 11:43:51.539274 0:2101 cckd_write_trkimg(3388): file[0] trk[1660] write_trkimg len 151 buf 000000000B2E3682:01006e000a
11:43:51 HHC00398I 11:43:51.539280 0:2101 cckd_read_l2(3001): file[0] read_l2 6 active 0 6 266
11:43:51 HHC00398I 11:43:51.539286 0:2101 cckd_write_trkimg(3403): file[0] trk[1660] write_trkimg oldl2 0x2509ad 151 151
11:43:51 HHC00398I 11:43:51.539292 0:2101 cckd_get_space(2188): get_space len 151 largest 49277 flags 0x02
11:43:51 HHC00398I 11:43:51.539298 0:2101 cckd_get_space(2299): get_space found 0x00000000004bacb2 len 151 size 151
11:43:51 HHC00398I 11:43:51.539304 0:2101 cckd_write(665): file[0] fd[8] write, off 0x00000000004bacb2 len 151
11:43:51 HHC00398I 11:43:51.539315 0:2101 cckd_write_l2ent(3299): file[0] l2[6,124] trk[1660] write_l2ent 0x4bacb2 151 151
11:43:51 HHC00398I 11:43:51.539321 0:2101 cckd_write(665): file[0] fd[8] write, off 0x0000000000003af0 len 8
11:43:51 HHC00398I 11:43:51.539331 0:2101 cckd_rel_space(2330): rel_space offset 0x00000000002509ad len 151 size 151
11:43:51 HHC00398I 11:43:51.539337 0:2101 cckd_read_l2ent(3242): file[0] l2[36,67] trk[9283] read_l2ent 0x12710
11:43:51 HHC00398I 11:43:51.539343 0:2101 cckd_read_l2(3001): file[0] read_l2 36 active 0 6 266
11:43:51 HHC00398I 11:43:51.539350 0:2101 cckd_read_l2(3020): l2[0,36] cache[277] hit
11:43:51 HHC00398I 11:43:51.539357 0:2101 cckd_read_l2ent(3260): file[0] l2[36,67] trk[9283] read_l2ent 0x250a44 53293 53293
11:43:51 HHC00398I 11:43:51.539363 0:2101 cckd_gc_percolate(5420): gcperc moved 94080 1st 0x22336a nbr 102
11:43:51 HHC00398I 11:43:51.539377 0:2101 cckd_flush_space(2434): flush_space nbr 102
11:43:51 HHC00398I 11:43:51.539384 0:2101 cckd_flush_space(2484): rel_flush_space nbr 97 (after merge)
11:43:51 HHC00398I 11:43:51.539394 0:2102 cckd64_gc_percolate(4316): gcperc size 262144 1st 0x0 nbr 0 largest 0
11:43:51 HHC00398I 11:43:51.539410 0:2103 cckd64_gc_percolate(4316): gcperc size 2097152 1st 0x0 nbr 0 largest 0
11:43:51 HHC00398I 11:43:51.539423 0:2103 cckd64_gc_percolate(4436): gcperc selected space 0x0000000000012658 len 65536
11:43:51 HHC00398I 11:43:51.539429 0:2103 cckd64_read(457): file[0] fd[10] read, off 0x0000000000012658 len 65536
11:43:51 HHC00342E 0:2103 CCKD file[0] d:/zos/devices/herc2/fds002.dev9: offset 0x0000000000012658 unknown space 7e0bd08200
11:43:51 HHC00007I Previous message from function 'cckd64_gc_percolate' at cckddasd64.c(4511)
11:43:51 HHC00399I CCKD file: internal cckd trace

So I shutdown Hercules and ran cckdcdsk64:

D:\zos\devices\herc2>
D:\zos\devices\herc2>cckdcdsk64 -3 fds002.dev9
HHC02499I Hercules utility cckdcdsk64 - CCKD64 dasd image verification - version 4.4.1.10647-SDL-gd0ccfbc9
HHC01414I (C) Copyright 1999-2022 by Roger Bowler, Jan Jaeger, and others
HHC01417I ** The SoftDevLabs version of Hercules **
HHC01415I Build date: Jan  6 2022 at 16:32:26
HHC00379I 0:0000 CCKD file fds002.dev9: starting cckdcdsk64 level 3...
HHC00363W 0:0000 CCKD file fds002.dev9: cdevhdr inconsistencies found, code 0090
HHC00007I Previous message from function 'cckd64_chkdsk' at cckdutil64.c(1246)
HHC00368W 0:0000 CCKD file fds002.dev9: free space errors detected
HHC00007I Previous message from function 'cckd64_chkdsk' at cckdutil64.c(1697)
HHC00377I 0:0000 CCKD file fds002.dev9: free space rebuilt
HHC00380W 0:0000 CCKD file fds002.dev9: cckdcdsk64 level 3 complete; rc=1
HHC00007I Previous message from function 'main' at cckdcdsk64.c(161)

After that, I restarted Hercules again, and the messages are now gone:

12:18:05 HHC00100I Thread id 000003c4, prio 5, name 'impl_thread' started
12:18:05 HHC00100I Thread id 000006e0, prio 4, name 'logger_thread' started
12:18:05 HHC01413I Hercules version 4.4.1.10647-SDL-gd0ccfbc9
12:18:05 HHC01414I (C) Copyright 1999-2022 by Roger Bowler, Jan Jaeger, and others
12:18:05 HHC01417I ** The SoftDevLabs version of Hercules **
12:18:05 HHC01415I Build date: Jan  6 2022 at 16:32:26
12:18:05 HHC01417I Built with: Microsoft Visual Studio 2008 (MSVC 150030729 1)
12:18:05 HHC01417I Build type: Windows MSVC AMD64 host architecture build
12:18:05 HHC01417I Modes: S/370 ESA/390 z/Arch
12:18:05 HHC01417I Max CPU Engines: 64
12:18:05 HHC01417I Using   shared libraries
12:18:05 HHC01417I Using   Fish threads Threading Model
12:18:05 HHC01417I Using   Error-Checking Mutex Locking Model
12:18:05 HHC01417I With    Shared Devices support
12:18:05 HHC01417I With    Dynamic loading support
12:18:05 HHC01417I With    External GUI support
12:18:05 HHC01417I With    Partial TCP keepalive support
12:18:05 HHC01417I With    IPV6 support
12:18:05 HHC01417I With    HTTP Server support
12:18:05 HHC01417I With    sqrtl support
12:18:05 HHC01417I With    Signal handling
12:18:05 HHC01417I With    Watchdog monitoring
12:18:05 HHC01417I With    CCKD BZIP2 support
12:18:05 HHC01417I With    HET BZIP2 support
12:18:05 HHC01417I With    ZLIB support
12:18:05 HHC01417I With    Regular Expressions support
12:18:05 HHC01417I With    Object REXX support
12:18:05 HHC01417I Without Regina REXX support
12:18:05 HHC01417I With    Automatic Operator support
12:18:05 HHC01417I Without National Language Support
12:18:05 HHC01417I With    CCKD64 Support
12:18:05 HHC01417I With    Transactional-Execution Facility support
12:18:05 HHC01417I With    "Optimized" instructions
12:18:05 HHC01417I With    OPTION_USE_SKAIP_AS_LOCK
12:18:05 HHC01417I With    OPTION_SIE2BK_FLD_COPY
12:18:05 HHC01417I Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 cmpxchg16 hatomics=msvcIntrinsics
12:18:05 HHC01417I Running on: PIXPD-QWR-001 (Windows-6.2.9200 AMD(R) x64) LP=4, Cores=4, CPUs=4
12:18:05 HHC01417I Built with crypto external package version 1.0.0.50-geed8a86
12:18:05 HHC01417I Built with decNumber external package version 3.68.0.100-g208ee4c
12:18:05 HHC01417I Built with SoftFloat external package version 3.5.0.103-g9ba90c7
12:18:05 HHC01417I Built with telnet external package version 1.0.0.60-g524b6e8
12:18:05 HHC00018I Hercules is running in elevated mode
12:18:05 HHC02323W This build of Hercules has only partial TCP keepalive support
12:18:05 HHC00007I Previous message from function 'impl' at impl.c(1118)
12:18:05 HHC00150I Crypto module loaded (C) Copyright 2003-2016 by Bernard van der Helm
12:18:05 HHC00151I Activated facility: Message Security Assist
12:18:05 HHC00151I Activated facility: Message Security Assist Extension 1, 2, 3 and 4
12:18:05 HHC17531W REXX(OORexx) dlopen 'rexx.dll' failed: The specified module could not be found.
12:18:05 HHC00007I Previous message from function 'OORexxLoad' at hRexxapi.c(1340)
12:18:05 HHC17511E REXX() Could not enable either Rexx package
12:18:05 HHC00007I Previous message from function 'Enable' at hRexx.c(515)
12:18:05 HHC17511E REXX() Could not enable default Rexx package
12:18:05 HHC00007I Previous message from function 'InitializeRexx' at hRexx.c(262)
12:18:05 HHC17500I REXX() Mode            : Command
12:18:05 HHC17500I REXX() MsgLevel        : Off
12:18:05 HHC17500I REXX() MsgPrefix       : Off
12:18:05 HHC17500I REXX() ErrPrefix       : Off
12:18:05 HHC17500I REXX() Resolver        : On
12:18:05 HHC17500I REXX() SysPath    (18) : On
12:18:05 HHC17500I REXX() RexxPath   ( 0) :
12:18:05 HHC17500I REXX() Extensions ( 8) : .REXX;.rexx;.REX;.rex;.CMD;.cmd;.RX;.rx
12:18:05 HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=1)
12:18:05 HHC00100I Thread id 00001bb4, prio 2, name 'Processor CP00' started
12:18:05 HHC00100I Thread id 00001ddc, prio 7, name 'timer_thread' started
12:18:05 HHC00811I Processor CP00: architecture mode z/Arch
12:18:05 HHC02204I LPARNAME       set to P002
12:18:05 HHC02204I PORT           set to port=8082 auth userid<hercules> password<herc8082>
12:18:05 HHC02204I CPUMODEL       set to 1090
12:18:05 HHC02204I MODEL          set to hardware(1090) capacity(1090) perm() temp()
12:18:05 HHC02204I CPUSERIAL      set to 01B7D7
12:18:05 HHC02204I LPARNUM        set to 1
12:18:05 HHC02204I MANUFACTURER   set to IBM
12:18:05 HHC02204I PLANT          set to 82
12:18:05 HHC02204I CPUVERID       set to 00
12:18:05 HHC17003I MAIN     storage is 6G (mainsize); storage is not locked
12:18:05 HHC17003I EXPANDED storage is 0 (xpndsize); storage is not locked
12:18:05 HHC00100I Thread id 000011e4, prio 2, name 'Processor CP01' started
12:18:05 HHC00811I Processor CP01: architecture mode z/Arch
12:18:05 HHC00100I Thread id 00001784, prio 2, name 'Processor CP02' started
12:18:05 HHC00811I Processor CP02: architecture mode z/Arch
12:18:05 HHC00100I Thread id 00001e1c, prio 2, name 'Processor CP03' started
12:18:05 HHC00811I Processor CP03: architecture mode z/Arch
12:18:05 HHC00100I Thread id 00001d6c, prio 2, name 'Processor CP04' started
12:18:05 HHC00811I Processor CP04: architecture mode z/Arch
12:18:05 HHC00100I Thread id 00001ce0, prio 2, name 'Processor CP05' started
12:18:05 HHC00811I Processor CP05: architecture mode z/Arch
12:18:05 HHC02204I NUMCPU         set to 6
12:18:05 HHC00827I Processor CP00: engine 00 type 0 set: CP
12:18:05 HHC00827I Processor CP01: engine 01 type 0 set: CP
12:18:05 HHC00827I Processor CP02: engine 02 type 0 set: CP
12:18:05 HHC00827I Processor CP03: engine 03 type 0 set: CP
12:18:05 HHC00827I Processor IP04: engine 04 type 5 set: IP
12:18:05 HHC00827I Processor IL05: engine 05 type 3 set: IL
12:18:05 HHC02204I ARCHLVL        set to z/Arch
12:18:05 HHC00898I Facility( 044_PFPO ) Enabled for z/Arch
12:18:05 HHC00898I Facility( 050_CONSTR_TRANSACT ) Enabled for z/Arch
12:18:05 HHC00898I Facility( 073_TRANSACT_EXEC ) Enabled for z/Arch
12:18:05 HHC02204I CPUIDFMT       set to 1
12:18:05 HHC02204I ECPSVM         set to disabled
12:18:05 HHC02204I TZOFFSET       set to -0300
12:18:05 HHC01474I Using internal codepage conversion table default
12:18:05 HHC02204I DIAG8CMD       set to ENABLE  ECHO
12:18:05 HHC02204I SHCMDOPT       set to ENABLE  DIAG8
12:18:05 HHC01802I HTTP server using root directory d:/zos/hercules/html/
12:18:05 HHC02204I ROOT           set to d:/zos/hercules/html/
12:18:05 HHC01807I HTTP server signaled to start
12:18:05 HHC01042I 0:000C COMM: device bound to socket 3505
12:18:05 HHC00100I Thread id 000004d0, prio 4, name 'http_server' started
12:18:05 HHC01802I HTTP server using root directory d:/zos/hercules/html/
12:18:05 HHC01803I HTTP server waiting for requests on port 8082
12:18:05 HHC00100I Thread id 0000180c, prio 4, name 'socket_thread' started
12:18:05 HHC00100I Thread id 00001430, prio 4, name 'console_connect' started
12:18:05 HHC01024I Waiting for console connections on port 3272
12:18:05 HHC04100I TunTap64.dll version 3.7.0.5530 initiated
12:18:05 HHC04100I TunTap64.dll version 3.7.0.5530 initiated
12:18:05 HHC00901I 0:2580 LCS: Interface tap0, type TAP opened
12:18:05 HHC00942I CTC: lcs interface tap0 using mac 02:00:5E:A8:46:FE
12:18:05 HHC00921I CTC: lcs device port 00: tuntap Multicast assist enabled
12:18:05 HHC00935I CTC: lcs device port 00: manual Checksum Offload enabled
12:18:05 HHC00414I 0:2100 CKD file d:/zos/devices/herc2/mvscat.dev3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00414I 0:2101 CKD file d:/zos/devices/herc2/mvspag.dev3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:2102 CKD64 file d:/zos/devices/herc2/fds001.dev3: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:2103 CKD64 file d:/zos/devices/herc2/fds002.dev9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:2104 CKD64 file d:/zos/devices/herc2/mvsp22.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:2106 CKD64 file d:/zos/devices/herc2/com001.zos3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:2107 CKD64 file d:/zos/devices/herc2/ada001.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:2108 CKD64 file d:/zos/devices/herc2/ada825.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:2109 CKD64 file d:/zos/devices/herc2/ctm001.dev9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:210A CKD64 file d:/zos/devices/herc2/stor01.wrk3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:210B CKD64 file d:/zos/devices/herc2/stor02.wrk3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:210C CKD64 file d:/zos/devices/herc2/stor03.wrk3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:210D CKD64 file d:/zos/devices/herc2/stor04.wrk3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:2110 CKD64 file d:/zos/devices/herc2/work01.wrk3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:2111 CKD64 file d:/zos/devices/herc2/work02.wrk3: model 3390-3 cyls 3339 heads 15 tracks 50085 trklen 56832
12:18:05 HHC00470I 0:2123 CKD64 file d:/zos/devices/herc2/d3res1.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:05 HHC00470I 0:2125 CKD64 file d:/zos/devices/herc2/d3res2.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2126 CKD64 file d:/zos/devices/herc2/d3sys1.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2127 CKD64 file d:/zos/devices/herc2/d3uss1.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2128 CKD64 file d:/zos/devices/herc2/d3uss2.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2124 CKD64 file d:/zos/devices/herc2/d4res1.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2135 CKD64 file d:/zos/devices/herc2/d4res2.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2136 CKD64 file d:/zos/devices/herc2/d4sys1.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2137 CKD64 file d:/zos/devices/herc2/d4uss1.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC00470I 0:2138 CKD64 file d:/zos/devices/herc2/d4uss2.zos9: model 3390-9 cyls 10017 heads 15 tracks 150255 trklen 56832
12:18:06 HHC17736W TXF: TIMERINT 50 is too small; using default of 400 instead
12:18:06 HHC02260I Script 1: begin processing file hercules2.rc
12:18:06 HHC00007I Previous message from function 'txf_set_timerint' at transact.c(2554)
12:18:06 HHC01603I loadparm 210024M1    #z/OS V2R4
12:18:06 HHC02204I LOADPARM       set to 210024M1
12:18:06 HHC01603I ipl 2124         #z/OS V2R4
12:18:06 HHC00100I Thread id 00000470, prio 2, name 'rubato_thread' started
12:18:06 HHC00811I Processor CP00: architecture mode ESA/390
12:18:06 HHC00811I Processor CP01: architecture mode ESA/390
12:18:06 HHC00811I Processor CP02: architecture mode ESA/390
12:18:06 HHC00811I Processor CP03: architecture mode ESA/390
12:18:06 HHC00811I Processor IP04: architecture mode ESA/390
12:18:06 HHC00811I Processor IL05: architecture mode ESA/390
12:18:06 HHC02264I Script 1: file hercules2.rc processing ended
12:18:06 HHC00814I Processor CP00: SIGP Set architecture mode            (12) CP00, PARM 00000001: CC 0
12:18:06 HHC00811I Processor CP00: architecture mode z/Arch
12:18:06 HHC00107I Starting thread cckd_ra(), active=0, started=0, max=2
12:18:06 HHC00100I Thread id 00001fb4, prio 4, name 'cckd_ra thread 1' started
12:18:06 HHC00107I Starting thread cckd_ra() from cckd_ra(), active=1, started=1, max=2
12:18:06 HHC00100I Thread id 000009d0, prio 4, name 'cckd_ra thread 2' started
12:18:06 HHC00801I Processor CP00: Operand exception code 0015 ilc 4
12:18:06 HHC02324I CP00: PSW=0400200080000000 0000000020000178 INST=B2342000     STSCH 0(2)                   store_subchannel
12:18:06 HHC02326I CP00: V:0000000020012998:R:000000007FFF3998:K:06=00000000 00000000 00000000 00000000  ................
12:18:06 HHC02269I CP00: R0=00000000200129D4 R1=0000000000050000
12:18:06 HHC02269I CP00: R2=0000000020012998 R3=0000000000000025
12:18:06 HHC02269I CP00: R4=0000000020012998 R5=0000000000000000
12:18:06 HHC02269I CP00: R6=000000002000F250 R7=0000000000000003
12:18:06 HHC02269I CP00: R8=00000000FFFFFFF9 R9=0000000020012000
12:18:06 HHC02269I CP00: RA=0000000000000005 RB=00000000200007A4
12:18:06 HHC02269I CP00: RC=00000000A0000000 RD=00000000200006EC
12:18:06 HHC02269I CP00: RE=00000000A0000172 RF=00000000FFFFFFF8
12:18:06 HHC02271I CP00: C0=0000000000000200 C1=0000000000004003
12:18:06 HHC02271I CP00: C2=0000000000000000 C3=0000000000000000
12:18:06 HHC02271I CP00: C4=0000000000000000 C5=0000000000000000
12:18:06 HHC02271I CP00: C6=00000000FE000000 C7=0000000000000000
12:18:06 HHC02271I CP00: C8=0000000000000000 C9=0000000000000000
12:18:06 HHC02271I CP00: CA=0000000000000000 CB=0000000000000000
12:18:06 HHC02271I CP00: CC=0000000000000000 CD=0000000000000000
12:18:06 HHC02271I CP00: CE=00000000C2000000 CF=0000000000000000
12:18:06 HHC00801I Processor CP00: Operand exception code 0015 ilc 4
12:18:06 HHC02324I CP00: PSW=0400200080000000 0000000020000178 INST=B2342000     STSCH 0(2)                   store_subchannel
12:18:06 HHC02326I CP00: V:0000000020012998:R:000000007FFF3998:K:06=00000000 00000000 00000000 00000000  ................
12:18:06 HHC02269I CP00: R0=00000000200129D4 R1=0000000000070000
12:18:06 HHC02269I CP00: R2=0000000020012998 R3=0000000000000025
12:18:06 HHC02269I CP00: R4=0000000020012998 R5=0000000000000000
12:18:06 HHC02269I CP00: R6=000000002000F250 R7=0000000000000003
12:18:06 HHC02269I CP00: R8=00000000FFFFFFF9 R9=0000000020012000
12:18:06 HHC02269I CP00: RA=0000000000000007 RB=00000000200007A4
12:18:06 HHC02269I CP00: RC=00000000A0000000 RD=00000000200006EC
12:18:06 HHC02269I CP00: RE=00000000A0000172 RF=00000000FFFFFFF8
12:18:06 HHC02271I CP00: C0=0000000000000200 C1=0000000000004003
12:18:06 HHC02271I CP00: C2=0000000000000000 C3=0000000000000000
12:18:06 HHC02271I CP00: C4=0000000000000000 C5=0000000000000000
12:18:06 HHC02271I CP00: C6=00000000FE000000 C7=0000000000000000
12:18:06 HHC02271I CP00: C8=0000000000000000 C9=0000000000000000
12:18:06 HHC02271I CP00: CA=0000000000000000 CB=0000000000000000
12:18:06 HHC02271I CP00: CC=0000000000000000 CD=0000000000000000
12:18:06 HHC02271I CP00: CE=00000000C2000000 CF=0000000000000000
12:18:07 HHC00814I Processor CP00: SIGP Unassigned                       (14) CP00, PARM 0000000064000000: CC 1 status 00000002
12:18:16 HHC00801I Processor CP00: Operation exception code 0001 ilc 4
12:18:16 HHC02324I CP00: PSW=0404000080000000 0000000000005B08 INST=B2AF0000     ????? ,                      ?
12:18:16 HHC02326I CP00: V:0000000000000000:R:0000000000000000:K:0E=000A0000 000130E1 00000000 00000000  ................
12:18:16 HHC02326I CP00: V:0000000000000000:R:0000000000000000:K:0E=000A0000 000130E1 00000000 00000000  ................
12:18:16 HHC02269I CP00: R0=FFFFFFFF00000000 R1=000000007FFFF000
12:18:16 HHC02269I CP00: R2=0000000000FD5740 R3=DBFE3B9E012BEA30
12:18:16 HHC02269I CP00: R4=0000000000000000 R5=0000000001D95418
12:18:16 HHC02269I CP00: R6=0000000000000000 R7=0000000000FD5740
12:18:16 HHC02269I CP00: R8=0000000001D95418 R9=0000000000FD5740
12:18:16 HHC02269I CP00: RA=0000000000005B0C RB=FFFFFFFF00000100
12:18:16 HHC02269I CP00: RC=0000000000005960 RD=0000000000005F28
12:18:16 HHC02269I CP00: RE=0000000001D95418 RF=0000000000000011
12:18:16 HHC02271I CP00: C0=00800002CE98EE20 C1=000000007CD68007
12:18:16 HHC02271I CP00: C2=000000007FEFE100 C3=0000000180000001
12:18:16 HHC02271I CP00: C4=0000000100000001 C5=000000007FEFE200
12:18:16 HHC02271I CP00: C6=00000000FE000000 C7=000000007CD68007
12:18:16 HHC02271I CP00: C8=0000000000000000 C9=0000000000000000
12:18:16 HHC02271I CP00: CA=0000000000000000 CB=0000000000000000
12:18:16 HHC02271I CP00: CC=0000000000000000 CD=000000007CD68007
12:18:16 HHC02271I CP00: CE=00000000C00FF069 CF=0000000000000000
12:18:16 HHC00801I Processor CP00: Specification exception code 0006 ilc 4
12:18:16 HHC02324I CP00: PSW=0404000080000000 0000000000005E32 INST=B2650000     SVS   0,0                    set_vector_summary
12:18:16 HHC02326I CP00: V:0000000000000000:R:0000000000000000:K:0E=000A0000 000130E1 00000000 00000000  ................
12:18:16 HHC02326I CP00: V:0000000000000000:R:0000000000000000:K:0E=000A0000 000130E1 00000000 00000000  ................
12:18:16 HHC02269I CP00: R0=FFFFFFFF00000000 R1=0000000000000001
12:18:16 HHC02269I CP00: R2=0000000000FD5CA0 R3=DBFE3B9E012BEA30
12:18:16 HHC02269I CP00: R4=0000000000FD5740 R5=0000000000FD5740
12:18:16 HHC02269I CP00: R6=0000000002642210 R7=0000000002642210
12:18:16 HHC02269I CP00: R8=0000000001D95418 R9=0000000000FD5740
12:18:16 HHC02269I CP00: RA=0000000000005E3A RB=FFFFFFFF00000100
12:18:16 HHC02269I CP00: RC=0000000000005960 RD=0000000000005F28
12:18:16 HHC02269I CP00: RE=0000000080005BFE RF=0000000000FD5740
12:18:16 HHC02271I CP00: C0=00800002CF98EE20 C1=000000007CD68007
12:18:16 HHC02271I CP00: C2=000000007FEFE100 C3=0000000180000001
12:18:16 HHC02271I CP00: C4=0000000100000001 C5=000000007FEFE200
12:18:16 HHC02271I CP00: C6=00000000FE000000 C7=000000007CD68007
12:18:16 HHC02271I CP00: C8=0000000000000000 C9=0000000000000000
12:18:16 HHC02271I CP00: CA=0000000000000000 CB=0000000000000000
12:18:16 HHC02271I CP00: CC=0000000000000000 CD=000000007CD68007
12:18:16 HHC02271I CP00: CE=00000000C00FF069 CF=0000000000000000
12:18:16 HHC00006I SCLP console interface active
12:18:20 HHC01603I exit
12:18:20 HHC01420I Begin Hercules shutdown
12:18:20 HHC01423I Calling termination routines
12:18:20 HHC00101I Thread id 000004d0, prio 4, name 'http_server' ended
12:18:20 HHC00101I Thread id 0000180c, prio 4, name 'socket_thread' ended
12:18:20 HHC00101I Thread id 00001bb4, prio 2, name 'Processor CP00' ended
12:18:20 HHC00811I Processor CP01: architecture mode z/Arch
12:18:20 HHC00101I Thread id 000011e4, prio 2, name 'Processor CP01' ended
12:18:20 HHC00811I Processor CP02: architecture mode z/Arch
12:18:20 HHC00101I Thread id 00001784, prio 2, name 'Processor CP02' ended
12:18:20 HHC00811I Processor CP03: architecture mode z/Arch
12:18:20 HHC00101I Thread id 00001e1c, prio 2, name 'Processor CP03' ended
12:18:20 HHC00811I Processor IP04: architecture mode z/Arch
12:18:20 HHC00101I Thread id 00001d6c, prio 2, name 'Processor CP04' ended
12:18:20 HHC00811I Processor IL05: architecture mode z/Arch
12:18:20 HHC00101I Thread id 00001ce0, prio 2, name 'Processor CP05' ended
12:18:20 HHC00101I Thread id 00001430, prio 4, name 'console_connect' ended
12:18:20 HHC00417I 0:2100 CKD file d:/zos/devices/herc2/mvscat.dev3: cache hits 165, misses 30, waits 0
12:18:20 HHC00333I 0:2101   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2101 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2101 [*] 32 00342113759 000% 0081    0000005 0000054 0000039 0000002  0000007
12:18:20 HHC00338I 0:2101 d:/zos/devices/herc2/mvspag.dev3
12:18:20 HHC00339I 0:2101 [0] 32 00342113759 000% 0081 rw 0000005 0000054 0000039
12:18:20 HHC00333I 0:2102   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2102 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2102 [*] 64 00374543690 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:20 HHC00338I 0:2102 d:/zos/devices/herc2/fds001.dev3
12:18:20 HHC00339I 0:2102 [0] 64 00374543690 000% 0000 rw 0000001 0000000 0000001
12:18:20 HHC00333I 0:2103   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2103 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2103 [*] 64 04776612693 045% 0001    0000001 0000000 0000001 0000000  0000001
12:18:20 HHC00338I 0:2103 d:/zos/devices/herc2/fds002.dev9
12:18:20 HHC00339I 0:2103 [0] 64 04776612693 045% 0001 rw 0000001 0000000 0000001
12:18:20 HHC00417I 0:2104 CKD file d:/zos/devices/herc2/mvsp22.zos9: cache hits 0, misses 1, waits 0
12:18:20 HHC00333I 0:2106   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2106 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2106 [*] 64 00145042681 000% 0007    0000001 0000000 0000001 0000000  0000001
12:18:20 HHC00338I 0:2106 d:/zos/devices/herc2/com001.zos3
12:18:20 HHC00339I 0:2106 [0] 64 00145042681 000% 0007 rw 0000001 0000000 0000001
12:18:20 HHC00333I 0:2107   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2107 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2107 [*] 64 00394276207 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:20 HHC00338I 0:2107 d:/zos/devices/herc2/ada001.zos9
12:18:20 HHC00339I 0:2107 [0] 64 00394276207 000% 0000 rw 0000001 0000000 0000001
12:18:20 HHC00333I 0:2108   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2108 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2108 [*] 64 00761574206 000% 0004    0000001 0000000 0000001 0000000  0000001
12:18:20 HHC00338I 0:2108 d:/zos/devices/herc2/ada825.zos9
12:18:20 HHC00339I 0:2108 [0] 64 00761574206 000% 0004 rw 0000001 0000000 0000001
12:18:20 HHC00333I 0:2109   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:20 HHC00335I 0:2109 ------------------------------------------------------------------------
12:18:20 HHC00336I 0:2109 [*] 64 00169429151 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:20 HHC00338I 0:2109 d:/zos/devices/herc2/ctm001.dev9
12:18:20 HHC00339I 0:2109 [0] 64 00169429151 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:210A   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:210A ------------------------------------------------------------------------
12:18:21 HHC00336I 0:210A [*] 64 00469159606 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:210A d:/zos/devices/herc2/stor01.wrk3
12:18:21 HHC00339I 0:210A [0] 64 00469159606 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:210B   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:210B ------------------------------------------------------------------------
12:18:21 HHC00336I 0:210B [*] 64 00550405066 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:210B d:/zos/devices/herc2/stor02.wrk3
12:18:21 HHC00339I 0:210B [0] 64 00550405066 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:210C   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:210C ------------------------------------------------------------------------
12:18:21 HHC00336I 0:210C [*] 64 00468643967 006% 2367    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:210C d:/zos/devices/herc2/stor03.wrk3
12:18:21 HHC00339I 0:210C [0] 64 00468643967 006% 2367 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:210D   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:210D ------------------------------------------------------------------------
12:18:21 HHC00336I 0:210D [*] 64 00472518267 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:210D d:/zos/devices/herc2/stor04.wrk3
12:18:21 HHC00339I 0:210D [0] 64 00472518267 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2110   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2110 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2110 [*] 64 00000771927 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2110 d:/zos/devices/herc2/work01.wrk3
12:18:21 HHC00339I 0:2110 [0] 64 00000771927 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2111   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2111 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2111 [*] 64 00000770314 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2111 d:/zos/devices/herc2/work02.wrk3
12:18:21 HHC00339I 0:2111 [0] 64 00000770314 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2123   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2123 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2123 [*] 64 09387791738 022% 0001    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2123 d:/zos/devices/herc2/d3res1.zos9
12:18:21 HHC00339I 0:2123 [0] 64 09387791738 022% 0001 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2125   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2125 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2125 [*] 64 09752824904 022% 0001    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2125 d:/zos/devices/herc2/d3res2.zos9
12:18:21 HHC00339I 0:2125 [0] 64 09752824904 022% 0001 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2126   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2126 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2126 [*] 64 07208631891 000% 0000    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2126 d:/zos/devices/herc2/d3sys1.zos9
12:18:21 HHC00339I 0:2126 [0] 64 07208631891 000% 0000 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2127   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2127 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2127 [*] 64 09560076992 022% 0001    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2127 d:/zos/devices/herc2/d3uss1.zos9
12:18:21 HHC00339I 0:2127 [0] 64 09560076992 022% 0001 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2128   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2128 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2128 [*] 64 09560039339 022% 0001    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2128 d:/zos/devices/herc2/d3uss2.zos9
12:18:21 HHC00339I 0:2128 [0] 64 09560039339 022% 0001 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2124   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00334I 0:2124                                                      readaheads   misses
12:18:21 HHC00335I 0:2124 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2124 [*] 64 09449480576 023% 0001    0000503 0000000 0000006 0000748  0000856
12:18:21 HHC00337I 0:2124                                                         0000395  0000040
12:18:21 HHC00338I 0:2124 d:/zos/devices/herc2/d4res1.zos9
12:18:21 HHC00339I 0:2124 [0] 64 09449480576 023% 0001 rw 0000503 0000000 0000006
12:18:21 HHC00333I 0:2135   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2135 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2135 [*] 64 09930644242 023% 0001    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2135 d:/zos/devices/herc2/d4res2.zos9
12:18:21 HHC00339I 0:2135 [0] 64 09930644242 023% 0001 rw 0000001 0000000 0000001
12:18:21 HHC00333I 0:2136   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:21 HHC00335I 0:2136 ------------------------------------------------------------------------
12:18:21 HHC00336I 0:2136 [*] 64 09374426762 023% 0001    0000001 0000000 0000001 0000000  0000001
12:18:21 HHC00338I 0:2136 d:/zos/devices/herc2/d4sys1.zos9
12:18:22 HHC00339I 0:2136 [0] 64 09374426762 023% 0001 rw 0000001 0000000 0000001
12:18:22 HHC00333I 0:2137   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:22 HHC00335I 0:2137 ------------------------------------------------------------------------
12:18:22 HHC00336I 0:2137 [*] 64 09645495587 023% 0007    0000001 0000000 0000001 0000000  0000001
12:18:22 HHC00338I 0:2137 d:/zos/devices/herc2/d4uss1.zos9
12:18:22 HHC00339I 0:2137 [0] 64 09645495587 023% 0007 rw 0000001 0000000 0000001
12:18:22 HHC00333I 0:2138   32/64       size free  nbr st   reads  writes l2reads    hits switches
12:18:22 HHC00335I 0:2138 ------------------------------------------------------------------------
12:18:22 HHC00336I 0:2138 [*] 64 09630896427 023% 0005    0000001 0000000 0000001 0000000  0000001
12:18:22 HHC00338I 0:2138 d:/zos/devices/herc2/d4uss2.zos9
12:18:22 HHC00339I 0:2138 [0] 64 09630896427 023% 0005 rw 0000001 0000000 0000001
12:18:22 HHC00101I Thread id 000009d0, prio 4, name 'cckd_ra thread 2' ended
12:18:22 HHC00101I Thread id 00001fb4, prio 4, name 'cckd_ra thread 1' ended
12:18:22 HHC00101I Thread id 00001790, prio 4, name 'cckd_gcol' ended
12:18:22 HHC00101I Thread id 0000195c, prio 1, name 'cckd_writer thread 1' ended
12:18:22 HHC01427I Main storage released
12:18:22 HHC01427I Expanded storage released
12:18:22 HHC01422I Configuration released
12:18:22 HHC00101I Thread id 00001ddc, prio 7, name 'timer_thread' ended
12:18:22 HHC00101I Thread id 000003c4, prio 5, name 'panel_display' ended
12:18:22 HHC00101I Thread id 00000470, prio 2, name 'rubato_thread' ended

Now I've restarted Hercules with my MVS.

Let's wait and see what will happen now.

João

Fish-Git commented 2 years ago

João,

Did you save a copy of "fds002.dev9" before you fixed it via cckdcdsk64?

I would VERY MUCH like to take a look at it! If you still have it (the broken copy before you fixed it), I'd like you to upload it to my FTP web site, so I can download it and look at it.

Do you still have it?

jreginatobr commented 2 years ago

Yes, I can get it from the last backup.

Send me your FTP data

Fish-Git commented 2 years ago

João Reginato (@jreginatobr) wrote:

Send me your FTP data

My primary FTP site is:

The incoming FTP folder you should upload your data to:

PLEASE NOTE:  The incoming folder is marked write-only. It does not have any 'read' permissions. After uploading your file, you will not be able to list the directory contents, since listing a directory's contents requires read permission, which the directory does not have.

No one will be able to download anything from that directory either, even if they know the filename, since downloading also requires read permission.

Only myself has full access to the directory (since I am the administrator), so it is completely secure. Your data should be safe.

_(Besides, I'm not interested in your data anyway! I'm only interested in how it is organized within your dasd image.)_

Thanks.

p.s. The directory can only hold about 16GB of data, so if your file is larger than that, please compress it before uploading it.

jreginatobr commented 2 years ago

The file was restored and is ready to be sent.

jreginatobr commented 2 years ago

Sending fds002.rar right now.

Please take a look to see if it is working fine

jreginatobr commented 2 years ago

The file is there

Fish-Git commented 2 years ago

João,

FYI: I would very much appreciate it if you would not respond/reply to GitHub Issues via email. (*)

_I would much prefer that you instead respond/reply directly via the GitHub Issues web page itself:_

_When you reply directly via their web page, I can make minor edits to your reply so it is more readable (prettier) by editing the fonts being used, etc._

When you reply via email however, I cannot edit your reply, so oftentimes it is much harder (more difficult) to read.

It is up to you whether or not you want to take the time to reply via their web page or continue to reply via email, but I would much rather that you reply directly via their web page instead.

Thanks

(*) GitHub does not support formatting of email replies, making it impossible for me to fix the formatting of a person's reply for readability. Thank you for understanding.

Fish-Git commented 2 years ago

The file is there

Thank you!

I have downloaded it and will now begin my analysis/debugging. Thanks again.

Fish-Git commented 2 years ago

The file is there

Thank you!

I have downloaded it and will now begin my analysis/debugging. Thanks again.

I was able to recreate the problem using your provided dasd image. Thank you!

I am now working on a fix...

jreginatobr commented 2 years ago

Great. Let’s wait a fix

Fish-Git commented 2 years ago

I was able to recreate the problem using your provided dasd image. Thank you!

I am now working on a fix...

I lied/misspoke.   :(

While the dasd image you provided does indeed trigger the problem, I have thus far been unable to determine why/how it occurs. That is to say, I have been unable to determine what sequence of events caused your dasd image to reach the state that it is currently in.

You see, the problem is, your dasd image is simply post mortem evidence that the problem has occurred.

The bug -- whatever it is -- has (had) already occurred, and your dasd image only provides after-the-fact evidence of the bug having already occurred, but otherwise provides ZERO evidence of precisely HOW it occurred (i.e. it provides zero evidence regarding the sequence of events that obviously occurred that lead up to the problem and thus to the dasd image's current state).

I have been trying like hell these past few weeks to try and recreate the problem and thus far have been unable to do so. Nothing I try causes the problem to occur.   :(

And without a reproducible test case (i.e. without the ability to recreate the problem on demand), I am therefore unfortunately unable to fix it.   :(

At this point in time, the best we can hope for (for the time being) is a temporary workaround to prevent any further potential damage from occurring as well as the prevention of the continuous stream of never ending error messages that also occurs whenever the problem gets triggered (i.e. the "band-aid" or "sticking plaster" approach, which I personally hate doing), which I already made earlier. (Refer to my July 5, 2022 series of commits, 2be8c9e9da1827d30886f395bf38f92b9579f4b6, 7ff011c4e643a35a7caee3e1f2a278ff09a3aad7, 92ea395f117acb6a33437be51512baab49ce0257 and 777a71c0eeb0a5b5acf61af1c416d99e11c47d77.)

I will continue scratching my head on this one in the hope of an eventual "Ah-HA!" or "Eureka!" moment, but I certainly won't be holding my breath in the mean time.   :(

jreginatobr commented 2 years ago

Ok Fish, I understand it.

Anyway, thank you very much for your efforts

Fish-Git commented 1 year ago

Closing as "Unknown" (cause unknown).

Discussion opened (#504) regarding sanity of CCKD Garbage Collection altogether, thereby possibly rendering ultimate cause/fix moot (depending on outcome of discussion of course).