SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
246 stars 92 forks source link

CTCE links dying with VM/Passthrough (PVM) #670

Open HackerSmacker opened 4 months ago

HackerSmacker commented 4 months ago

Hi folks,

I've ran into some hot water with a recent breaking change on Hercules, around the 4.6 mark; this change has seemingly broken PVM. I have tested the following configurations of PVM and found that SOME work:

PVM 2.1 (1993):

PVM 2.1 (1998):

Specifically, VM/ESA 2.4 (my hub node) can talk to any not-XA+ VM (so, any 370-type VM). This behavior seems to be correct -- I can have any non-XA version of VM talking to any other non-XA version of VM or an XA version of VM, but, two XA versions of VM cannot talk to each other. There are no protocol differences between the different versions of PVM I used -- I only used different versions to try to gain more "period-accurateness" since I am somewhat lacking in different versions of PVM (I only have 5 versions, 2 of which cannot talk to the other 3). I think this may be related to this issue here: https://github.com/SDL-Hercules-390/hyperion/issues/640 I recall being able to revive the links in the past by recreating the devices, but, it was not a permanent fix.

Version info:

HHC01413I Hercules version 4.7.0.11119-SDL-gf7d2360a
HHC01414I (C) Copyright 1999-2024 by Roger Bowler, Jan Jaeger, and others
HHC01417I ** The SDL 4.x Hyperion version of Hercules **
HHC01415I Build date: Jul  2 2024 at 15:01:43
HHC01417I Built with: GCC 13.2.1 20230801
HHC01417I Build type: GNU/Linux x86_64 host architecture build
HHC01417I Running on: server1 (Linux-6.6.8 x86_64) MP=32
HHC01417I Built with crypto external package version 1.0.0.52-ga5096e5
HHC01417I Built with decNumber external package version 3.68.0.102-g3aa2f45
HHC01417I Built with SoftFloat external package version 3.5.0.105-g4b0c326
HHC01417I Built with telnet external package version 1.0.0.63-g729f0b6

The link devices are defined as such, for example:

# VM/ESA 2.4
0441    CTCE    3501 127.0.0.1 3502

# z/VM 6.2
0441    CTCE    3502 127.0.0.1 3501

The device was initialized with CP SET RDEVICE 441 TYPE CTCA beforehand, though the autosense detects the correct device type.

Fish-Git commented 2 months ago

@HackerSmacker:  Have you tried using the 4.8 'develop' branch of Hercules yet? Does the problem exist there too? Or does it only fail with version 4.7? Some minor(?) changes where made to CTCE logic since 4.7 was released that only exist in version 4.8-DEV, so you might want to give 4.8 a try.

If 4.8 still fails the same way, then we'll obviously have to dig into your issue a little deeper.

Thanks.

Fish-Git commented 2 months ago

Also, a SIE fix was recently made to 4.8-DEV too (which fixed a problem with VM/ESA 2.4), which might also impact what you're doing, so again, please give our 4.8 'develop' branch a try and let us know whether it works any better or not. Thanks.

Peter-J-Jansen commented 2 months ago

@HackerSmacker : Issue #640 is indeed the latest CTCE fix that may be helpful to you. I suggest you to build 4.8 development branch commit a291e7e9 (or later) and try that. If it still does not work, both Hercules logs would be needed to try researching the problem. Thanks.

HackerSmacker commented 2 months ago

Awesome, I'll give it a roll soon. I've got a few things to hammer out and test along with a VTAM CTCA timeout issue (this only happens with VTAM 3.3 on VM/SP or VSE/SP). I'll return with some test results in a few hours!

HackerSmacker commented 2 months ago

I've compiled it, and, it's running a few different versions of VM. I'll chime back in tomorrow with test results for PVM, VTAM, RSCS, and TSAF (whether or not the links die). I'm testing VM/ESA 1.1, 1.2, 2.1, 2.4, z/VM 4.4, 5.3, and 6.4.

HackerSmacker commented 2 months ago

Okay, I've let it run for about a day, and, RSCS/PVM/VTAM are rock-solid (so far, this might change later), but, TSAF (at least, on z/VM 4.4 and VM/ESA 2.4) still shows no hope. I've read through https://github.com/SDL-Hercules-390/hyperion/issues/640 but I'm still getting that dreaded SET_370_MODE error:

02:26:39 ATSL1Y795I Retry limit exceeded on unit 0E50 SET_370_MODE
02:26:39 ATSL1Y708E An attempt to reset link 0E50 has failed
02:26:39 ATSMRX520I Synchronization is now NORMAL

The Herc console (with ctc debug on e50) reports the following:

HHC05079I 0:0E50 CTCE: -> 0:0E51 #0011 cmd=RST=00 xy=aa->Aa l=0000 k=0F500510              w=0,r=0 SENSE=4100 CLEAR                                   
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0012 cmd=RST=00 xy=aa->aa l=0000 k=0F500513              w=0,r=0 SENSE=4100 HALT                                    
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0013 cmd=NOP=03 xy=aa->Aa l=0001 k=0F510411 Stat=0C CC=0 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0014 cmd=NOP=03 xy=aa->aa l=0001 k=0F510416 Stat=0C CC=0 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0015 cmd=SEM=C3 xy=an->an l=0001 k=0F5104D7 Stat=0C CC=0 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0016 cmd=WRT=01 xy=an->an l=03FC k=A8A1E217 Stat=02 CC=1 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0017 cmd=WRT=01 xy=an->an l=0034 k=26376CD9 Stat=02 CC=1 w=0,r=0 SENSE=4100

The other side (VM/ESA 2.4) has the same behavior... I'll continue to look into it; I got interrupted with a 4-hour-gap as I was configuring it, and, as such, I do not recall if I ever saw the link go up.

HackerSmacker commented 2 months ago

Alrighty... I'm a few days in and there haven't been any issues at all. That fix definitely did something, but, I'm still at a loss for TSAF; it's definitely user-error on my end though, I suspect.