SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
237 stars 89 forks source link

FORCE parameter on 'sf' command does not work correctly #636

Closed JimDooey closed 5 months ago

JimDooey commented 6 months ago

When merging shadow files using the FORCE parameter (sf- * force), the base file(s) gets corrupted and an IPL error will occur. Shadow files are deleted.

The problem is repeatable,

The guest was OS/390. The host as Windows 11 Home Edition (64-bit). The Hercules version was HHC01413I Hercules version 4.6.0.10941-SDL-g65c97fd6.

My Hercules configuration and log files are attached below:

Fish-Git commented 6 months ago

Thanks, Jim. I'm on it!

Fish-Git commented 6 months ago

Okay, something is not right.  :(

According to the configuration file you provided, you defined 14 dasds: 0A80 - 0A8D.

But according to the Hercules log you provided, only 13 dasds were opened: 0A80 - 0A8C.

Something is amiss.  :(

Fish-Git commented 6 months ago

Question:  Before you attempted this backward merge, had your Hercules / guest crashed at any point previously or your system unexpectedly lose power or anything? Perhaps at some point in the past?

And if so, did you responsibly run a cckdcdsk on all of your dasds and shadow files too afterwards, to ensure none of them were damaged before starting Hercules for the first time afterwards?

Because if any of the previous shadows or base images for any of your dasds was in a damaged state before you attempted your merge, then it should hardly come as any surprise that Hercules would have trouble performing the merge and that the end result would be a damaged volume.

Fish-Git commented 6 months ago

Okay, never mind my previous question above. I was able to recreate your problem, and you are absolutely 100% correct: there is definitely a bug somewhere in the sf-* force command.

I started out with 3 sets of shadow files and after doing my first forced merge, I then had 2 sets. After another merge, I had only 1 set. Each of those merges ran clean. (No error messages)

But then on the final merge that should be merging the last shadow file back into the base image, I got all kinds of errors (well, warnings actually, not errors) about cdevhdr inconsistencies and free space errors:

HHC00363W 0:0A90 CCKD file Q:/CCKD64/zOS 2.5C (ADCD)/c5imf1.cckd64: cdevhdr inconsistencies found, code 0001
HHC00364W 0:0A90 CCKD file Q:/CCKD64/zOS 2.5C (ADCD)/c5imf1.cckd64: forcing check level 1
HHC00368W 0:0A90 CCKD file Q:/CCKD64/zOS 2.5C (ADCD)/c5imf1.cckd64: free space errors detected

So there's definitely a bug somewhere!

JimDooey commented 6 months ago

The log file submitted is from September, 2023. The configuration file submitted is a current version. Another DASD volume was created since September, 2023.

No power loss or Hercules crash was noted. In fact, this error has occurred on other OS’s at various times. Thus, the reason for two configuration files, one with “ro” and another without.

Fish-Git commented 6 months ago

Okay, this might be a clue (for me, not you!): it appears that some of the volumes' shadow files got merged just fine (without any warnings being issued for them)  (I noticed the same thing in your log file too), and it looks like they might be for volumes that weren't ever written to. That is to say, I suspect the problem -- whatever it is and wherever it is -- only occurs on those volumes which were updated (written to). That narrows my bug hunt a wee bit.

Fish-Git commented 6 months ago

Okay, I ran a cckdcdsk -3 -ro ... (read-only, no repair, i.e. only report any errors you find) on all of my dasds, and none of them had any errors.

So the "errors" (warnings) that are being issued might not actually be true errors. They might just be a side effect of the final merge process that shouldn't actually be issued at all. That is to say, the warnings might be being issued prematurely, before the merge process has fully completed. I'm not 100% sure about that though, but that's what it seems at this point in my investigation based on the evidence from my test.

I then IPLed my system (again, without doing any type of "repair" on any of the supposedly damaged volumes), and the system came up just fine. There was nothing unusual or out of the ordinary during the IPL. I had no problems whatsoever.

So as I said, the warning messages might be bogus. But again, that's preliminary. I'll have to investigate further.

Why your IPL didn't work, I don't know. Mine worked just fine.

Fish-Git commented 5 months ago

Jim, (@JimDooey)

In your problem report you said:

... the base file(s) gets corrupted and an IPL error will occur.

Are you simply presuming that? Or does and IPL error actually occur for you?

If an IPL error actually occurs, may I see the evidence for that?  (Both an OS/390 log for a good IPL as well as for the one that fails, as well as the Hercules log file for the one that fails.)

Because as I've stated above, the problem appears to be totally cosmetic. There does not appear to be any actual file damage to any of the dasds, and subsequent IPLs of the guest seem to work just fine!

Please confirm!

In the mean time, I am going to see if I can find where these [apparently bogus] warnings are coming from and add code to prevent their issuance under the described circumstances.

Thanks.

JimDooey commented 5 months ago

Are you simply presuming that? Or does and IPL error actually occur for you?

 

Screenshot 2024-02-26 193741

 

If an IPL error actually occurs, may I see the evidence for that? (Both an OS/390 log for a good IPL as well as for the one that fails, as well as the Hercules log file for the one that fails.)

Following is the failed IPL log file. Note there are TWO sets of shadow files:

Do you want a good IPL log file BEFORE the "sf- force" command? If so, here is the log with the two sets of shadow files BEFORE a "sf- force" command.

Fish-Git commented 5 months ago

Following is the failed IPL log file. Note there are TWO sets of shadow files:

Why? Why are there two sets of shadow files? I thought you merged the second set into the first set? Yes? And then afterwards, merged that one remaining set back into the base? Yes? How did you get a second set? Did you do a sf+* at some point?

I also noticed that you did your IPL immediately after doing the merge. I myself always power off (exit from Hercules) after doing the merge, and THEN power back on again (start Hercules) and do the IPL. I've never tried IPLing immediately after doing a merge. Maybe that's where the problem is? Maybe that's why it's been working for me? Hmmm... I'll have to try it your way to see what happens.

p.s. Why did you close this issue? It's not finished yet!

JimDooey commented 5 months ago

On Feb 26, 2024, at 9:44 PM, Fish-Git @.***> wrote:

Why? Why are there two sets of shadow files? I thought you merged the second set into the first set? Yes? And then afterwards, merged that one remaining set back into the base? Yes? How did you get a second set? Did you do a sf+* at some point?

I was reading another post related to shadow files and thought I would try to create a second set of shadow files by using the sf + * command. Remember, the problem log I first submitted was from September of last year. You now have a current one.

I also noticed that you did your IPL immediately after doing the merge. I myself always power off (exit from Hercules) after doing the merge, and THEN power back on again (start Hercules) and do the IPL.

The documentation doesn’t specify that I have to shut down Hercules and restart. I use HercGui and feel it unnecessary to quit and restart just to make it work.

p.s. Why did you close this issue? It's not finished yet!

I didn’t mean to close the issue. I hit the wrong button on GitHub.

Jim Snellen

Fish-Git commented 5 months ago

https://github.com/SDL-Hercules-390/hyperion/issues/572#issuecomment-1589444421

Fish-Git commented 5 months ago

I also noticed that you did your IPL immediately after doing the merge. I myself always power off (exit from Hercules) after doing the merge, and THEN power back on again (start Hercules) and do the IPL.

The documentation doesn’t specify that I have to shut down Hercules and restart. I use HercGui and feel it unnecessary to quit and restart just to make it work.

Well unfortunately, that is apparently a requirement. I will update the documentation to mention this requirement as well as the code itself to prevent it.

As far as the seemingly "bogus" error (warning) messages that are being issued on the final merge (i.e. when merging the last/final shadow back into the base image), I'm afraid I don't have an answer. I've been trying for the past day to determine why they are being issued, but have unfortunately been unsuccessful.  :(

I think I'm going to give up for now, and simply advise that when merging shadows back into the base image (which is not advisable in the first place!), that you might see some error/warning message being issued as a result, which should be considered an unpreventable side effect of such a merge and are considered to be benign. No file damage has actually occurred.

With that in mind, I will be closing this issue once I commit my changes in the next day or so.

p.s. Have you verified that your OS/390 guest IPLs just fine if you recycle Hercules (exit from Hercules and restart it again) before attempting your IPL after doing the merge? As I said, it works fine for me, but fails just like it does for you if I immediately try to IPL without recycling Hercules first.

Fish-Git commented 5 months ago

With that in mind, I will be closing this issue once I commit my changes in the next day or so.

Fix committed (4f5188643de71608225a53dec3617fc26e340d95), documentation updated. Closing issue.