SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

Hercules took a dump #592

Closed SE20225 closed 1 year ago

SE20225 commented 1 year ago

Hercules issued a dump.

Occurred with /PB03:

20:19:14 HHC01413I Hercules version 4.6.0.10941-SDL-g65c97fd6

Running on:

20:19:14 HHC01417I Running on: T480S (Windows-6.2.9200 Intel(R) x64) LP=8, Cores=4, CPUs=1

The MVS involved is a TK4- regenned to include FEATURES=ACR and CRH, support for multiprocessing and Channel Recover Hardware.

The config deck was modified to specify the majority of DASD devices like:

0F40 3350 dasd/work00.140
0:0140 3350 localhost::0F40
1:0140 3350 localhost::0F40

but I had some devices defined with mismatching device types so they were issued:

20:21:14 HHC00704S 0:0152 Shared: remote device 0F52 is a 3330

when 3350 was specified on the 0: and 1: statements. So they did not enter the configuration and the IPL that followed was meaningless anyway.

The Hercules dump was issued a few seconds later.

When correcting the config deck, no dump occurred and the system did come up (but slowly, but that may be a different problem).

So this problem is reported only because Herecules took a dump. It does not distrub me at all.

Attachments (Log, Config and the dump file):

andersedlund @ telia.com

Fish-Git commented 1 year ago

Hi Anders!

Unfortunately I was unable to determine the cause of your crash. Windows debugger analysis of your crash dump points the crash as occurring at a location in Hercules code that does not make any sense. I can see no problem with Hercules code at the location where the crash is supposedly occurring.

So I would like to try to reproduce the crash myself.

Is this crash easily reproducible? Are you able to easily reproduce the crash on demand? Whenever you want?

If so, how may I do that? What do I need to do?

Do I need to use TK4-? Is that a requirement? Can the crash be reproduced using a different guest operating system? (Such as DOS/VS or VM370 or z/OS or z/VM, etc?) In other words, is it just the bad Shared Device configuration statement that causes the crash? Or something else?

Thanks.

SE20225 commented 1 year ago

I tried 5 times with the badly written/typed config but the system came up, or at least started to initialize, every time, so I think we can close this problem record. We cannot expect to be able to recreate it.

It leaves one question though:

You suggested that I achieve multiple paths by coding like:

0:0140 3350 dasd/work00.140
1:0140 3350 localhost::0140

This is not trouble-free, although I have not yet reported my findings. I have therefore tried a setup with more 'symmetric' qualities like:

0F44 3350 dasd/pub010.241
0:0144 3350 localhost::0F44
1:0144 3350 localhost::0F44

So far I have only tried a few volumes like this without any problems.

The dump that originally occurred actually occurred on the very first attempt at starting Hercules with the majority of devices (a couple of dozens) defined like this. BUT, I noticed that the establishment of the config becomes noticeably slow as a result. It progresses with one device every 10 seconds. This is clearly visible in the log provided with this trouble report.

Is this to be expected with this type of dasd config? If so, I can of course live with it as long as it only hits while the config is established. Otherwise, what can I do to collect information to have it improved? As long as only one or two devices were defined like this, I never noticed the delay!

Anders Edlund

Fish-Git commented 1 year ago

I tried 5 times with the badly written/typed config but the system came up, or at least started to initialize, every time, so I think we can close this problem record. We cannot expect to be able to recreate it.

Good enough. I will close this issue as "UNKNOWN" ("Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it.")

It leaves one question though:

You suggested that I achieve multiple paths by coding like:

0:0140 3350 dasd/work00.140
1:0140 3350 localhost::0140

This is not trouble-free, although I have not yet reported my findings.

It was just a suggestion.  :)

Having never tried to define a multiple paths configuration before (never had the need and wouldn't know how to configure any guest I'm familiar with to expect/use multiple paths!), so I made an educated guess as to how I thought it might be achieved. I obviously guessed wrong.  :)

I would like to know what your findings were though. I would like to update our documentation on how it can be (should be) done. If you could provide your findings in this area, I would greatly appreciate it. Thanks.

I have therefore tried a setup with more 'symmetric' qualities like:

0F44 3350 dasd/pub010.241
0:0144 3350 localhost::0F44
1:0144 3350 localhost::0F44

Yes, I noticed that in your provided "MVS_CONFMP.cnf" file. That seems to make more sense than my original ill-thought suggestion.

So far I have only tried a few volumes like this without any problems.

That's good information to know! And that's precisely the type of information I would appreciate your providing to us. I'm not aware of anyone ever having tried to define a multi-path Hercules configuration before. You're the first! So any information you could provide to us on this topic would be greatly appreciated! Thanks.

BUT, I noticed that the establishment of the config becomes noticeably slow as a result. It progresses with one device every 10 seconds. This is clearly visible in the log provided with this trouble report.

Hmmm. I hadn't noticed it before, but yes, I can definitely see that now. Interesting!

Is this to be expected with this type of dasd config?

I don't know. It could be. I'll have to look into it.

If so, I can of course live with it as long as it only hits while the config is established. Otherwise, what can I do to collect information to have it improved?

I'm not sure. As I said, I'll have to look into it and get back to you. I might be able to reproduce that particular aspect of the problem myself. I don't know. I'll have to give it a try. A quick grep of Hercules source code however, reveals it may be a minor bug in our shared dasd handling logic. But as I said, I'll have to look into that and get back to you. Thanks for mentioning it. I was so focused on the actual error messages themselves that I hand't even noticed it!

Even though I am going to be closing this GitHub Issue (per your request), I will let you know what my research into this 10-second-config-initialization-slowness problem reveals via another comment *`()`** to this thread. Cool?


*`()`**  One can continue to post comments to a GitHub Issue even though the Issue has been closed.

Fish-Git commented 1 year ago

Closed per user request.