SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

Guest IPL under z/VM causes FRE016 Hard Abend #590

Closed Peter-J-Jansen closed 6 months ago

Peter-J-Jansen commented 1 year ago

It is known that even the current development SDL Hyperion still exhibits problems, but so far these have been intermittent, and mostly random and not reproducible. Until z/VM 7.3 came along.

When IPL-ing z/OS 2.5 second level under z/VM 7.3, it will cause a z/VM ABEND FRE016, perfectly reproducible.

When 2nd level under z/VM 7.2, or 1st level under any z/VM, such ABEND does not occur. Also, 1st level z/OS 2.5 runs just fine too (although z/OSMF needs the Vector Facility to be turned off beforehand; the MACHMIG VEF statement in the SYS1.IPLPARM is no longer honored but silently ignored instead).

This is the last screen seen on the 2nd level z/OS console is:

   IFB086I LOGREC DATA SET IS SYS1.S0W1.LOGREC
   ISG313I SYSTEM IS INITIALIZING IN GRS NONE MODE.  RING OR STAR CONFIGURATION
    KEYWORDS IN GRSCNF00 ARE IGNORED.
   IAR040I REAL STORAGE AMOUNTS:
     TOTAL AVAILABLE ONLINE: 10G
       LFAREA LIMIT FOR xM, xG, OR xT      : 6G
       LFAREA LIMIT FOR SUM OF 1M= AND 2G= : 4915M
       LFAREA LIMIT FOR 2GB PAGES FOR 2G=  : 0 (NOT SUPPORTED)
   IAR048I LFAREA WAS NOT SPECIFIED WHICH RESULTED IN 0 1MB PAGES AND 0 2GB
    PAGES.
   IAR013I NO STORAGE IS RECONFIGURABLE
   IEA940I THE FOLLOWING PAGE DATA SETS ARE IN USE:
           PLPA ........... - SYS1.S0W1.PLPA.PAGE
           COMMON ......... - SYS1.S0W1.COMMON.PAGE
           LOCAL .......... - SYS1.S0W1.LOCALA.PAGE
           LOCAL .......... - SYS1.S0W1.LOCALB.PAGE
   CSV410I APF FORMAT IS NOW DYNAMIC

An analysis using VMDUMPTL shows:

>>> symptom
Symptom Record for Incident DDBA0D9A 8CBDBSYM

TOD Clock . . DDBA0D9A8CBDB058         Date. . . . . 08/10/23
Time Zone . . 02.00.00                 Time. . . . . 14:16:30.206939
CPU model . . 8561                     Base SCP. . . 5741
CPU Serial. . 01168A                   NodeID. . . . EULER73
Dump Name . . FRE016 DUMP0001 X1       Dump Type . . CPDUMP
Comp ID . . . 5741A09                  Ver/Rel/Mod . V07R03M0
Dump format . 64-BIT
------------------------------------------------------------
Primary Symptom Strings
              PIDS/5741A0902           (Component ID)
              AB/SFRE016               (Abend Code)
              RIDS/FRE                 (Failing Module)
              REGS/0FA98               (Register/PSW Info)
------------------------------------------------------------
Section 5 Data:
              USERID DUMPED: SYSTEM
              DUMP RECEIVER: OPERATNS
              SPOOLID: 0012
------------------------------------------------------------
Last trace entry on abending processor
1FC951C0 14:16:30  FRE016 Hard Abend svc 00 at HCPFRE+1400 opsw 04042000_002EBD00 svcilc 0002
------------------------------------------------------------
Abend  Description
FRE016 The control block being returned to the free storage manager has had its header or trailer (or both) overlaid.
>>> trace for 2
1FC951C0 14:16:30  FRE016 Hard Abend svc 00 at HCPFRE+1400 opsw 04042000_002EBD00 svcilc 0002
1FC951A0 14:16:30  Release 16 dw (RCW) at 00FBE798 by _UNTFR+24C vmdbk 02788000 PJZOS250
>>> d 7be790.8
_007BE790 +0000 00000000 00000000 *........*
>>> d 7be798.90
_007BE798 +0000 00000000 00000000 00000000 00000000 *................*
_007BE7A8 +0010 00000000 00000000 007FFFFF FF000000 *........."......*
_007BE7B8 +0020 00000000 00010003 00008000 00000003 *................*
_007BE7C8 +0030 FF000000 00000000 00000000 00C3C6E2 *.............CFS*
_007BE7D8 +0040 00020000 00000040 00000000 00280000 *....... ........*
_007BE7E8 +0050 00000000 00000000 00000000 00000000 *................*
_007BE7F8 +0060 00000000 00000000 00000000 00000000 *................*
_007BE808 +0070 00000000 00000000 0000FFFE 00000000 *................*
_007BE818 +0080 00000000 00000207 02040A00 000009FF *................*

Any suggestions as to how to proceed with Hercules debugging will be appreciated.

Cheers,

Peter

Fish-Git commented 1 year ago

Peter: may we see your 1st level and 2nd level directory statements for your z/OS guest please, as well as your 1st level directory statements for your 2nd level z/VM too?

It would also be nice to see your Hercules configuration file (and log file!) too please.

Also, how did you create your 2nd level z/VM's dasds? Are they simply copies of your 1st level z/VM's dasds?

Basically, I need a little bit more information on how to reproduce your problem. I will of course attempt to reproduce it myself in my own way, but in case I can't, it would be nice to document precisely how you are able to reproduce the problem on your system.

Thanks!

Peter-J-Jansen commented 1 year ago

I've replied to Fish off-list.

Cheers,

Peter

Fish-Git commented 1 year ago

FYI:

I finally managed to set things up to try IPLing z/OS under 2nd level z/VM, and it failed while I wasn't looking, so I don't know whether it was a "FRE016 Hard Abend" or not.  :(

I've been incredibly busy with looking into many other things besides this issue, so I have not had time to try it again yet. But I definitely will.

I will note however, that the abend, whatever it was, occurred on the 2nd level z/VM, not the 1st level z/VM, so I'm not sure how to go about trying to capture and/or research that. My z/VM debugging skills are virtually non-existent.  :(

What's worse is, if this is indeed a Hercules bug (and I have no doubt that it is), I too, just like Peter, am just as perplexed as to how to go about trying to debug/capture it!! Especially given that it seems to only happen on the second level z/VM system and not the first level!  :`(

Any ideas anyone? I think researching this issue might be out of my league.  :(

Peter-J-Jansen commented 6 months ago

My test setups no longer exhibit this problem. That is not to say that I no longer experience any Hercules problems like a crash now and then without even leaving any message whatsoever in the Hercules log. But this perfectly reproducible FRE016 from August last year no longer occurs. Actually at times z/VM (7.2 or 7.3) with z/OS 2.5 in a VM under it, runs pretty stable for hours or even days.

I therefor propose to close this issue. When I have had the time to document experienced remaining Hercules problems, I'll open a fresh issue for it.

Thanks,

Peter

Fish-Git commented 6 months ago

I therefor propose to close this issue. When I have had the time to document experienced remaining Hercules problems, I'll open a fresh issue for it.

Sounds good to me! Closing!