SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

Hercules sf- command issues #616

Closed oldbuzzard54 closed 7 months ago

oldbuzzard54 commented 7 months ago

I have encountered issues with using the Hercules sf- command. I am using Windows 10 and the SDL 4.x Hyperion version of Hercules:

Hercules version 4.6.0.10941-SDL-g65c97fd6
Build date: Jun  8 2023 at 15:48:12

I also had a printer spool called "Lister" capturing the printer output.

Attached further below is a zip file which contains the log files and start-herc.bat.

No OS was IPLed when I issued an sf-* with Lister on. I received a similar message to the following for many shadowed files:

13:32:10 HHC01603I sf-*
13:32:10 HHC00321I 0:0390 CCKD file: merging shadow files...
13:32:10 HHC00324E 0:0390 CCKD file[1] shadow_dasd/tk5res_1.390: shadow file not merged: file[0] cannot be opened read-write
13:32:10 HHC00333I 0:0390   32/64       size free  nbr st   reads  writes l2reads    hits switches
13:32:10 HHC00335I 0:0390 ------------------------------------------------------------------------
13:32:10 HHC00336I 0:0390 [*] 32 00036246473 004% 0096    0000000 0000000 0000000 0000000  0000000
13:32:10 HHC00338I 0:0390 dasd/tk5res.390
13:32:10 HHC00339I 0:0390 [0] 32 00035284301 003% 0092 ro 0000000 0000000 0000000
13:32:10 HHC00340I 0:0390 shadow_dasd/tk5res_*.390
13:32:10 HHC00341I 0:0390 [1] 32 00000962172 045% 0004 rw 0000000 0000000 0000000

But later on:

13:32:10 HHC00321I 0:0243 CCKD file: merging shadow files...
13:32:10 HHC00325I 0:0243 CCKD file[1] c:/Users/eglis/HERCULES/mvs38j-Shared/shadow/evol00_1.243: shadow file successfully merged
13:32:10 HHC00333I 0:0243   32/64       size free  nbr st   reads  writes l2reads    hits switches
13:32:10 HHC00335I 0:0243 ------------------------------------------------------------------------
13:32:10 HHC00336I 0:0243 [*] 32 00010207819 007% 0008    0000000 0000000 0000000 0000000  0000000
13:32:10 HHC00338I 0:0243 c:/Users/eglis/HERCULES/mvs38j-Shared/dasd/evol00.243
13:32:10 HHC00339I 0:0243 [0] 32 00010207819 007% 0008 rw 0000000 0000000 0000000
13:32:10 HHC00340I 0:0243 c:/Users/eglis/HERCULES/mvs38j-Shared/shadow/evol00_*.243

After I restarted Hercules and no OS was IPLed, I issued the sf-* again:

13:40:22 HHC01603I sf-*
13:40:22 HHC00321I 0:0390 CCKD file: merging shadow files...
13:40:22 HHC00325I 0:0390 CCKD file[1] shadow_dasd/tk5res_1.390: shadow file successfully merged
13:40:22 HHC00333I 0:0390   32/64       size free  nbr st   reads  writes l2reads    hits switches
13:40:22 HHC00335I 0:0390 ------------------------------------------------------------------------
13:40:22 HHC00336I 0:0390 [*] 32 00035284301 001% 0116    0000000 0000000 0000000 0000000  0000000
13:40:22 HHC00338I 0:0390 dasd/tk5res.390
13:40:22 HHC00339I 0:0390 [0] 32 00035284301 001% 0116 rw 0000000 0000000 0000000
13:40:22 HHC00340I 0:0390 shadow_dasd/tk5res_*.390

But later on:

13:40:22 HHC00321I 0:0243 CCKD file: merging shadow files...
13:40:22 HHC00323E 0:0243 CCKD file[0] c:/Users/eglis/HERCULES/mvs38j-Shared/dasd/evol00.243: cannot remove base file

Yes the shadow was removed earlier but I found the message misleading and/or confusing. I am trying to merge shadows not remove base files.

The zip file contains the full text that the above examples were extracted from as well as the .bat files used to start hercules. The full config files are also included.

Fish-Git commented 7 months ago

Wow! This "TK" (TurnKey?) thing is quite complicated! It's going to take me a while to break it down so I can see what the heck is going on!   _(Hasn't anyone heard of the KISS Principle before?! Sheesh!)_

Two things I noticed right away was, you failed to provide a copy of the scripts directory. It looks like the startup invokes a bunch of .rc scripts that presumably issue some Hercules commands. For completeness, I'd like to also see them too. Could you upload them too please? Thanks.

Second, it looks like this TK5 product is doing something I've never seen before, which may well be the root cause(*) of your problem:

        It's using, of all things, shared shadow files!!

In all my days of working with Hercules (23 years and counting) I've NEVER seen anyone do that before! That doesn't necessarily mean you can't do it of course. I don't know. I'm not sure. I'm just saying I've never seen it before. It might be okay. It might not be. I just don't know! I need to do some more research on the subject.

That's all for now. I'm just getting started and as I said, TK5 seems to be a very complicated package/product, so it's going to be difficult to determine where the bug is and thus is going to take me a while to figure it all out.

In any case, my preliminary assessment of the situation would be this is a simple case of PEBKAC: you (or the packager/designer of the TK5 product) are simply not using Hercules correctly. But as I said, that's just a preliminary (first best guess) assessment. More research is definitely needed before the final cause can be determined!

IN THE MEAN TIME, I'd appreciate your providing the missing scripts directory/files I mentioned. Just for completeness. Thanks.

More later as I work through this mess...


*`()`** But that remains to be seen of course. I need some time to think about this more and do some experimentation.

Fish-Git commented 7 months ago

Note: the "PEBKAC" label is only temporary/preliminary. If the problem turns out to be an actual Hercules bug or need for enhancement, I will of course remove that label and assign a more appropriate one.

oldbuzzard54 commented 7 months ago

Wow! This "TK" (TurnKey?) thing is quite complicated! It's going to take me a while to break it down so I can see what the heck is going on! (Hasn't anyone heard of the KISS Principle before?! Sheesh!)

The "structure" of the config file goes all the way back to TK3.

Second, it looks like this TK5 product is doing something I've never seen before, which may well be the root cause(*) of your problem)

Maybe the use of the word shared is confusing. I used to run TK3, TK4 and TK5. I created my own 3350 volumes so I could have access to all my datasets no matter which TK version I was using. I used shared to indicate the same DASD was available to any TK version I ran. I never had more than one instance of Hercules running at the same time on the same PC.

Two things I noticed right away was, you failed to provide a copy of the scripts directory. It looks like the startup invokes a bunch of .rc scripts that presumably issue some Hercules commands. For completeness, I'd like to also see them too. Could you upload them too please? Thanks.

Fish-Git commented 7 months ago

Helpful off-topic hint/trick: whenever you reply to a GitHub Issue, selected the other person's text that you wish to reply to, and press the 'r' key on your keyboard. GitHub will then automatically insert that text into your reply preceded with the standard "quote" character, which is a > sign.

Fish-Git commented 7 months ago

Maybe the use of the word shared is confusing. I used to run TK3, TK4 and TK5. I created my own 3350 volumes so I could have access to all my datasets no matter which TK version I was using. I used shared to indicate the same DASD was available to any TK version I ran. I never had more than one instance of Hercules running at the same time on the same PC.

(emphasis added)

Ah. OK. That makes sense. That's good to know. One less thing to worry about. Thanks.

Fish-Git commented 7 months ago

The "structure" of the config file goes all the way back to TK3.

Doesn't matter. It's still way too complicated IMHO. With so many pieces spread out all over the place, it makes the product much too fragile and orders of magnitude more difficult to easily see what's going on and in what sequence.

If I were doing it, I would have just coded several different config files, each with everything needed within them, and simply chosen ONE of them for Hercules to actually use. Same idea with the .rc script files too.

Yes, there would obviously be some redundancy/duplication of effort involved in maintaining that much smaller set of files of course, thereby slightly increasing the maintenance effort, but not to any unmanageable level IMO. And doing so would certainly greatly reduce the product's number of "moving parts", thereby decreasing the product's fragility too, while also greatly increasing the ease at which it can be debugged as a result.

And reducing the debugging effort (ease at which the product can be debugged) as well as its overall fragility (number of moving parts that can end up causing bugs needing to be debugged) is a Very Good Thing(tm) in my professional opinion.

But then I guess maybe that's just the bias brought about from almost 50 years of experience on the subject unfairly clouding my judgement?   >;-)

oldbuzzard54 commented 7 months ago

Doesn't matter. It's still way too complicated IMHO. With so many pieces spread out all over the place, it makes the product much too fragile and orders of magnitude more difficult to easily see what's going on and in what sequence.

I fully believe in the KISS method. Back when I first started with TK3, I literally went through the config file and "cut and pasted" it into one file to try to simplify it. It was taking too long so I abandoned the job.

oldbuzzard54 commented 7 months ago

I wrote a "quick and dirty" program in Visual Basic to merge all the config files in the sequence they are processed. I attached the output of this program. Hopefully, this will help you. Only the conditional includes were bypassed.

Fish-Git commented 7 months ago

Only the conditional includes were bypassed.

No good.

It's missing the volumes evol00 through evol003 from local_conf\01. It might be missing other dasd volumes too.

Fish-Git commented 7 months ago

Something very WEIRD is going on in Hercules!

I'm seeing the messages:

13:32:10 HHC00321I 0:0248 CCKD file: merging shadow files...
13:32:10 HHC00323E 0:0248 CCKD file[0] dasd/page00.248: cannot remove base file

in your log file, but the only place I am seeing dasd page00 defined anywhere is in include file conf\tk5.cnf where it's defined as:

0248 3350 dasd/page00.248

There's no shadow file defined for this volume! So why does Hercules think there's a shadow file for it that needs to be merged?!

The same thing occurs for several other dasds as well: the config does not specify any shadow file specification for the dasd yet Hercules for some unknown reason seems to think one exists that needs to be merged! It's happening for the "cbt" dasds defined in the include file conf\cbt_dasd.cnf too, which are defined as:

0340 3350 dasd/cbt000.340 fakewrite
0341 3350 dasd/cbt001.341 fakewrite
0342 3350 dasd/cbt002.342 fakewrite
0343 3350 dasd/cbtcat.343 fakewrite

WTF is going on?!

There seems to be a serious bug somewhere in Hercules's shadow file handling!   :(

All of the other shadow file related errors you are seeing are all completely legitimate however.  For example, the very first error for dasd 390 (tk5res) is reported with messages:

13:32:10 HHC00321I 0:0390 CCKD file: merging shadow files...
13:32:10 HHC00324E 0:0390 CCKD file[1] shadow_dasd/tk5res_1.390: shadow file not merged: file[0] cannot be opened read-write
13:32:10 HHC00333I 0:0390   32/64       size free  nbr st   reads  writes l2reads    hits switches
13:32:10 HHC00335I 0:0390 ------------------------------------------------------------------------
13:32:10 HHC00336I 0:0390 [*] 32 00036246473 004% 0096    0000000 0000000 0000000 0000000  0000000
13:32:10 HHC00338I 0:0390 dasd/tk5res.390
13:32:10 HHC00339I 0:0390 [0] 32 00035284301 003% 0092 ro 0000000 0000000 0000000
13:32:10 HHC00340I 0:0390 shadow_dasd/tk5res_*.390
13:32:10 HHC00341I 0:0390 [1] 32 00000962172 045% 0004 rw 0000000 0000000 0000000

As you can see, this particular dasd does indeed have a shadow file (and only 1 shadow file!), but its base file file is indeed marked ro (read-only) too. Thus this shadow file cannot be merged since there is no previous r/w shadow file or r/w base file image either. This is a legitimate user error (i.e. PEBKAC). Now whether this particular user error is the fault of you the user (for not understanding how shadow files work) or the fault of the package designer (for largely the same thing) is debatable. But in any case Hercules is working correctly in these particular cases. There is no bug for these particular errors.

But for the others ones I mentioned further above? There's definitely a bug somewhere in Hercules! It shouldn't be trying to do a merge of a non-existent shadow file! It obviously got confused somehow.

HOW it got confused I don't know yet. I'll have to figure that out. And whether or not your "lister" thingy has anything to do with it or not, I don't know that yet either. Personally I can't see how it possibly could.

But part of Hercules (a subset of shadow file processing) does indeed appear to be very sick, unfortunately.   :(

More news as it happens...

Fish-Git commented 7 months ago

Added "BUG" label (since there obviously is one somewhere!), but kept the "PEBKAC" label since the user and/or package designer also doesn't appear to be using shadow files correctly either. More on that later.

atncsj6h commented 7 months ago

0248 3350 dasd/page00.248

just curious... does the current hercules still support the old format for attaching devices ???

too much curious I wrote a one liner hercules configuration and it is still supported

Fish-Git commented 7 months ago

does the current hercules still support the old format for attaching devices ???

I wasn't aware there were multiple formats! What is the "old format"?!

atncsj6h commented 7 months ago

What is the "old format

the format witout the attach

Fish-Git commented 7 months ago

0248 3350 dasd/page00.248

does the current hercules still support the old format for attaching devices ???

I wasn't aware there were multiple formats! What is the "old format"?!

the format witout the attach

You mean just typing in a device statement from the command line without any attach command preceding it?

herc =====>   0248 3350 dasd/page00.248

AFAIK that has never been a valid command format.

That format has only ever been valid as a device statement in a configuration file, but it has (AFAIK) never been a valid Hercules command. I just tried it with Hercules 3.x and it was, as expected, summarily rejected as an invalid command, just as it is was for current Hercules 4.x as well.

Fish-Git commented 7 months ago

0248 3350 dasd/page00.248

just curious... does the current hercules still support the old format for attaching devices ???

too much curious I wrote a one liner hercules configuration and it is still supported

WHAT is still supported?!   (And what version of Hercules did you test with?)

atncsj6h commented 7 months ago

I was not thinking about adding a device from the command line I was thinking only about defining a device in the hercules configuration file

Fish-Git commented 7 months ago

I was thinking only about defining a device in the hercules configuration file

Oh. I see.

Well I wouldn't call that an "old" format. To the contrary, I would call it a "new" format.

Originally, Hercules's configuration file could only contain configuration file statements and nothing else. Then at some point (during 4.x development I think; I can't remember exactly when), every Hercules configuration file statement (except for device statements, obviously) were also considered to be (and processed as) valid Hercules commands as well.

This allowed you to conveniently do such things as enable tracing or debugging or other things by simply inserting the command at the appropriate location in your configuration file, thus relieving you from having to enter such command into a separate .rc file (or enter them into a script file and manually enter the script command once Hercules was done coming up).

Thus, while previously, the attach command was only valid when entered via the Hercules command line (i.e. it was an invalid configuration file statement), now it could also be used in the configuration file as well.

So allowing the use of the attach command in a configuration file is not what I would call "old format". To the contrary, I would consider it to be more of a "new format".

Fish-Git commented 7 months ago

(Followup from earlier comment):

Something very WEIRD is going on in Hercules! ... But part of Hercules (a subset of shadow file processing) does indeed appear to be very sick, unfortunately.   :(

Thankfully, it wasn't as sick as I originally feared. There was only a minor bug in the shadow file removal processing that was failing to check whether the given dasd image actually had any shadow files that needed to be merged. (Doh!) It was only checking if the image as a compressed image (i.e. CCKD file), but wasn't checking whether there were any shadow files for it.

This was actually what I would consider to be a minor (but nonetheless confusing!) bug insomuch as the shadow file removal function was being unnecessarily called as a result when it really shouldn't have been, resulting in the confusing (unexpected) error message. No actual harm was being done since the shadow file removal function was just bailing out with an error message without actually doing anything. It was just confusing, that's all.

This has now been fixed.

Additionally, I also noticed (and fixed) another minor issue (and corrected the documentation) for when the fakewrite option is specified for a dasd image. After closely reviewing the code, it became obvious that fakewrite was only meant to be an additional option for the readonly option. That is to say, the way the code was written, fakewrite was only considered to be valid when the readonly option was also specified. The two options go hand-in-hand.

While readonly may be specified without the fakewrite option also being specified, the reverse is not true: fakewrite can never be specified without the readonly option also being specified. The fakewrite option is actually a "fine tuning" of the readonly option, and is not a stand alone option by itself.

This requirement was not being enforced in the processing of the dasd options, resulting in messages HHC00403I or HHC00476I never being issued with the "with fake writing" notification phrase. And neither flag (fakewrite nor readonly) was being reported in the query device status string either, thereby leaving the user completely in the dark regarding whether a given dasd's readonly and fakewrite options were being recognized and honored or not.

Both of these issues have now been corrected by commits 2cb97a858702361bab2e7bfd2c04a64cd3240070 and 29fe55732c79a7df2fe9e0eca24026d19e3f3c00.

I believe these changes resolve this issue. If they don't, it can always be opened again (by either you or myself).

Thanks.