SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

IFI004 triggered by changing the CTCI definition #375

Closed gjcullen closed 3 years ago

gjcullen commented 3 years ago

I have just migrated a zVM54 image from Hercules available as an rpm under Fedora Linux 31 to a very recent Hyperion version (Hercules version 4.4.9999.0-SDL-g3062c6ea (4.4.9999.0) ) that I freshly downloaded, compiled and linked, from GitHub. Thank you.

I as yet have experienced only a single problem with the change. When TCPIP under zVM54 is brought down using the external interruption command:

#CP EXTERNAL

the zVM54 abnormally terminates with an IFI004 and system dump. After a number of experiments I have found that the problem can be triggered by the Hercules configuration definition of the Channel-to-Channel Adapter line (CTCI).

When the line in the configuration reads:

E200.2 CTCI /dev/net/tun 1500 192.168.100.1  192.168.100.2  255.255.255.0

I will enjoy connectivity across the emulated CTC with the outside network, but do get the IFI004 Abend when TCPIP is shutdown.

When the line in the configuration reads:

E200.2 3088 CTCI /dev/net/tun 1500 192.168.100.1  192.168.100.2  255.255.255.0

I will have NO connectivity across the CTC, but I will have no problems during the TCPIP shutdown.

The problem is easily reproducible.

Other Information:

HHC01603I version
HHC01413I Hercules version 4.4.9999.0-SDL-g3062c6ea (4.4.9999.0)
HHC01414I (C) Copyright 1999-2020 by Roger Bowler, Jan Jaeger, and others
HHC01417I ** The SoftDevLabs version of Hercules **
HHC01415I Build date: Mar  3 2021 at 11:28:04
HHC01417I Built with: GCC 9.3.1 20200408 (Red Hat 9.3.1-2)
HHC01417I Build type: GNU/Linux x86_64 host architecture build
HHC01417I Modes: S/370 ESA/390 z/Arch

Any help or suggestions are appreciated

Respectfully, Geoff

Fish-Git commented 3 years ago

When TCPIP under zVM54 is brought down using the external interruption command:

#CP EXTERNAL

I myself have never had the need to specifically shutdown only(?) TCPIP, so I'm not familiar with this technique. Which z/VM userid do you enter this command in? What z/VM userid do you need to logon to, to enter this command? Operator? TCPIP? TCPMAINT?

gjcullen commented 3 years ago

I employ the zVM Programmable Operator (PROP) for automation. The TCPIP machine is accessible through Secondary Console Image Facility (SCIF), Both PROP and SCIF have been part of VM and zVM for a long time.

To bring down TCPIP you issue a command from an authorized user: #CP SMSG OPERATOR CMD NETSTAT CP EXTERNAL. This is usually done in an automated fashion. You could also issue it manually, or to simplify, just logon to the TCPIP machine itself and issue #CP EXT to cause an external interruption.

(The reason that I am shutting down only TCPIP is for isolation of the problem).

Geoff

Fish-Git commented 3 years ago

Geoff,    (@gjcullen)

I did some tests on both z/VM 7.1 using QETH/OSA (not CTCI) as well as on z/VM 6.3 using LCS as well as using CTCI (all on Windows(*)), and in none of my tests did z/VM crash. TCPIP appeared to be shutdown as expected and the system keep running just fine. There was no IFI004 crash.

So I was unable to recreate your problem with the z/VM systems I have. (I do not have z/VM 5.4 so I am unable to test with your exact (or semi-exact) environment.)

I will have to think about what type of tracing/debugging you might be able to do to help us try and determine where things are going wrong, but in the mean time is there any way for you to upgrade to a more recent version of z/VM? Or, barring that, have you checked to see if maybe there are some APAR(s)(?) (fixes) for z/VM 5.4 that maybe you're missing?

I'll keep this issue open for a little longer to give me time to maybe come up with a plan of attack, but at the moment it looks like the ultimate resolution might end up being "Unknown".   :(

Sorry!   :(

(But give me a few days to think about things. There might be some debugging/information gathering we might be able to do. I don't know yet.)


_(*) I have Linux (KDE Neon) installed in a VMware virtual machine on my Windows system, but I only use it to make sure Hercules builds and runs okay on Linux. I don't really use it for any serious Linux testing since not only do I know very little about Linux (I'm a Windows person, not a Linux person), but in all my years as a Hercules developer, I have unfortunately never been able to get networking working correctly on any of my Linux systems._   :(

Fish-Git commented 3 years ago

Here are 3270 screen grab snippets along with my comments for the tests I performed:


z/VM 7.1 using QETH/OSA:

    CP EXT
    09:48:44 DTCSHU004E
    09:48:44 DTCSHU009E 1 active client, with 1 connection in use.
    09:48:44 DTCSHU010E I will delay shutting down for 30 seconds, so that
    09:48:44 DTCSHU011E RSTs and shutdown notifications may be delivered.
    09:48:44 DTCSHU012E If you wish to shutdown immediately, without warning,
    09:48:44 DTCSHU013E type #CP EXT again.
    09:48:44 DTCSHU014I
    09:48:44 DTCSTM119I Server Telnet closed down. Bye.

    CP DISC
    DISCONNECT AT 09:49:00 PST MONDAY 03/08/21

    Press enter or clear key to continue

(I probably should have waited longer (for the 30 seconds to expire) before disconnecting! Sorry!)

    09:47:53 GRAF  0460 RECONNECT  TCPIP    USERS = 10    BY IBMVM1
    09:49:00 GRAF  0460 DISCONNECT TCPIP    USERS = 10
    09:49:14 USER DSC   LOGOFF AS  TCPIP    USERS = 9
    09:49:14 OSA   0600 DETACHED TCPIP    0600 BY TCPIP
    09:49:14 OSA   0601 DETACHED TCPIP    0601 BY TCPIP
    09:49:14 OSA   0602 DETACHED TCPIP    0602 BY TCPIP

No z/VM 7.1 crash. No IFI004. System still running just fine!


z/VM 6.3 using LCS:

No z/VM 6.3 crash. No IFI004. System still running just fine!


z/VM 6.3 using CTCI:

    CP EXT
                                  ***** 03/08/21 *****
    11:19:30 DTCSHU004E
    11:19:30 DTCSHU009E 2 active clients, with 2 connections in use.
    11:19:30 DTCSHU010E I will delay shutting down for 30 seconds, so that
    11:19:30 DTCSHU011E RSTs and shutdown notifications may be delivered.
    11:19:30 DTCSHU012E If you wish to shutdown immediately, without warning,
    11:19:30 DTCSHU013E type #CP EXT again.
    11:19:30 DTCSHU014I
    11:19:30 DTCSTM119I Server Telnet closed down. Bye.
    USER DSC   LOGOFF AS  FTPSERVE USERS = 11
    11:20:00 DTCCTC082E CTCA shutting down:
    11:20:00 DTCPRI385I    Device CTCDEV1:
    11:20:00 DTCPRI386I       Type: CTC, Status: Ready
    11:20:00 DTCPRI387I       Envelope queue size: 0
    11:20:00 DTCPRI388I       Address: 0E20
    11:20:00 DTCPRI280I ShutDown at 76.775 seconds
    DTCRUN1014I Server ended normally at 11:20:00 on 8 Mar 2021 (Monday)
    DTCRUN1019I Server will not be logged off because you are connected
    CON FILE 0001 SENT TO   TCPMAINT RDR AS  0043 RECS 0132 CPY  001 T NOHOLD NOKEEP
    Ready; T=1.14/1.34 11:20:00

No z/VM 6.3 crash. No IFI004. System still running just fine!


The Hercules logs for each test were unremarkable as well. Nothing unusual. No crash. No IFI004, and after waiting for a minute or three, z/VM's shutdown proceeded normally as well. Nothing unusual.

As I said, this is all under Windows. Maybe Linux tuntaps behave differently? I don't know. As I explained I am unable to perform any type of networking tests on my Linux VMware system.

I still have not figured out what type of debugging/tracing can be done to help determine why this is happening for you. I personally suspect it might be a bug in your z/VM 5.4. Where did you get it? Have you looked for any maintenance that can be applied?

Fish-Git commented 3 years ago

When the line in the configuration reads:

E200.2 3088 CTCI /dev/net/tun 1500 192.168.100.1  192.168.100.2  255.255.255.0

I will have NO connectivity across the CTC, but I will have no problems during the TCPIP shutdown.

This is hardly surprising. Device type 3088 is unknown/invalid:

19:03:52.520 HHC01462E 0:0E20 devtype 3088 not recognized
19:03:52.520 HHC00007I Previous message from function 'attach_device' at config.c(1300)

As a result, device 0E20 and 0E21 were never added to your Hercules configuration and thus your z/VM system did not have any type of networking device available to it, thereby causing TCPIP to never get started. Thus it comes to no surprise that you had no networking connectivity as well as no problems shutting down TCPIP since it was never started in the first place!

How did you configure your z/VM's networking? May I see your PROFILE TCPIP and SYSTEM DTCPARMS files?

Have you (with the possible help from someone else) tried analyzing the z/VM dump? Does it reveal anything unusual?

I'm stuck!   :(

jeff-snyder commented 3 years ago

Hi Fish,

I've run into this or a very similar issue as well. If I run VM/ESA 2.4.0 with an OS/390 2.10 guest and I shut down TCPIP normally (i.e. #CP EXT) in VM/ESA, no problem, but if I shutdown TCPIP in the OS/390 guest (P TCPIP), VM/ESA blows up with the IFI004.

My Hercules config statement for the OS/390 TCPIP CTC looks like: 5EE.2 CTCI -s 255.255.255.0 192.168.1.105 192.168.1.32 # TCPIP to LAN I have confirmed that IP works to/from OS/390 before the failure. If I run OS/390 outside of VM/ESA, it appears to shut down normally. I don't see any abends or Hercules issues.

I am running a recent build of Hercules on Windows 10. I downloaded it on 2/21. HHC01413I Hercules version 4.4.9999.0-SDL (4.4.9999.0)

What can I provide to help resolve this issue?

gjcullen commented 3 years ago

Fish,

I have gotten closer to the problem area, with more details (this time repeating the same problems with a zVM7.1 image:

The abend is triggered, not by the shutdown of TCPIP stack, but by the stopping of the Channel-to-Channel connection between zVM TCPIP and the Linux host system. I am beginning to think at TUN/TAP interaction. Below are the CTC definitions and the console logs of the TCPIP shutdown sequences.

I will continue to narrow the problem. I appreciate your time on this matter. Respectfully, Geoff


Hercules Configuration snippet defining the CTC connection:

E200.2 CTCI /dev/net/tun 1500  192.168.100.3  192.168.100.2  255.255.255.0

zVM TCPIP Configuration snippet defining the CTC connection:

; -----------------------------------------------------------------------
  DEVICE  CTCDEV1    CTC      E200
  LINK       CTCLINK1   CTC      0    CTCDEV1
;
HOME
; IP Address      Subnet Mask     Link Name
; --------------- --------------- ------------
  192.168.100.3   255.255.255.0   CTCLINK1
;
  PRIMARYINTERFACE  CTCLINK1

GATEWAY
;
; (IP) Network     Subnet                First                    Link               MTU
; Address            Mask                   Hop                     Name             Size
; -----------             -------------            ---------------          ----------           -----
  192.168.100.2 255.255.255.0   =                        CTCLINK1      1500
  ;
  DEFAULTNET                    192.168.100.2   CTCLINK1   1500

  START CTCDEV1

Directory entry for TCPIP:

IDENTITY TCPIP XXXXXXXX 128M 256M BG   
  INCLUDE TCPCMSU                                               
   BUILD ON * USING SUBCONFIG TCPIP-1            
   IUCV ALLOW                                                         
   IUCV ANY PRIORITY                                              
   IUCV *CCS PRIORITY MSGLIMIT 255                                
   IUCV *VSWITCH MSGLIMIT 65535                                   
   MACHINE XA                                                          
   OPTION QUICKDSP SVMSTAT MAXCONN 1024 DIAG98 APPLMON
   SHARE RELATIVE 3000                                                 
   DEDICATE E200 E200                                                   
   DEDICATE E201 E201                                                   
   CONSOLE 0009 3215 T OPERATOR 

zVM Programmable Operator log snippet showing network shutdown:

---------------------------------------                                        

Commencing VM Network Shutdown sequence                                         

----------------------------------------                                        
RSCS    : 20:05:00 DMTCMX005I Location CPZVM71B(OPERATOR) executing: SHUTDOWN   
RSCS    : 20:05:00 DMTNCR570I Link CPZVM54A now set to deactivate               
RSCS    : 20:05:00 DMTPRD779I TCP Port redirector stopped                       
RSCS    : 20:05:00 DMTTNE183I Link CPZVM54A session terminated                  
RSCS    : 20:05:00 DMTMAN002I Link CPZVM54A deactivated                         
RSCS    : 20:05:00 DMTNCR570I Link CPZVM71A now set to deactivate               
RSCS    : 20:05:00 DMTTNE183I Link CPZVM71A session terminated                  
RSCS    : 20:05:00 DMTMAN002I Link CPZVM71A deactivated                         
RSCS    : 20:05:00 DMTMAN100I RSCS Networking terminated                        
RSCS    : CONNECT= 00:00:22 VIRTCPU= 000:00.09 TOTCPU= 000:00.19                
RSCS    : LOGOFF AT 20:05:01 EST TUESDAY 03/09/21                               
20:05:01 USER DSC   LOGOFF AS  RSCS     USERS = 26                              
GCS     : USER DSC   LOGOFF AS  RSCS     USERS = 26                             
VM TCP/IP Netstat Level 710       TCP/IP Server Name: TCPIP                     
TCPIP   : 20:05:01 DTCMON442I CP EXT issued by OPERATOR; results pending        
TCPIP   : 20:05:01 DTCMON443I CP EXT issued by OPERATOR; command successful     
TCPIP   :                               ***** 03/09/21 *****                    
TCPIP   : 20:05:01 DTCSHU004E                                                   
TCPIP   : 20:05:01 DTCSHU009E 6 active clients, with 2 connections in use.      
TCPIP   : 20:05:01 DTCSHU010E I will delay shutting down for 30 seconds, so that
TCPIP   : 20:05:01 DTCSHU011E RSTs and shutdown notifications may be delivered. 
TCPIP   : 20:05:01 DTCSHU012E If you wish to shutdown immediately, without warni
TCPIP   : 20:05:01 DTCSHU013E type #CP EXT again.                               
TCPIP   : 20:05:01 DTCSHU014I                                                   
TCPIP   : 20:05:01 DTCSTM119I Server Telnet closed down. Bye.                   
CP return code = 0                                                              
VM TCP/IP Netstat Level 710       TCP/IP Server Name: DTCVSW1                   
CP return code = 0                                                              
VM TCP/IP Netstat Level 710       TCP/IP Server Name: DTCVSW2                   
CP return code = 0                                                              
GCS     : CONNECT= 00:00:23 VIRTCPU= 000:00.00 TOTCPU= 000:00.03                
GCS     : LOGOFF AT 20:05:01 EST TUESDAY 03/09/21                               
20:05:01 USER DSC   LOGOFF AS  GCS      USERS = 25                              
Command complete                                                                
PROP200H: The status of RSCS is changed from UP to QUIESCING                    
HCPMSG045E RSCS not logged on                                                   
PROP200H: The status of RSCS is changed from QUIESCING to DOWN                  
HCPMSG045E RSCS not logged on                                                   
PROP200H: The status of TCPIP is changed from UP to QUIESCING                   
20:05:01 USER DSC   LOGOFF AS  DTCVSW1  USERS = 24                              
20:05:01 USER DSC   LOGOFF AS  DTCVSW2  USERS = 23                              
20:05:01 USER DSC   LOGOFF AS  FTPSERVE USERS = 22                              
TCPIP   : USER DSC   LOGOFF AS  FTPSERVE USERS = 22                             
TCPIP   : 20:05:31 DTCCTC082E CTCA shutting down:                               
TCPIP   : 20:05:31 DTCPRI385I    Device CTCDEV1:                                
TCPIP   : 20:05:31 DTCPRI386I       Type: CTC, Status: Ready                    
TCPIP   : 20:05:31 DTCPRI387I       Envelope queue size: 0                      
TCPIP   : 20:05:31 DTCPRI388I       Address: E200                               

IFI004 ABEND occurs after this point


TCPIP console log snippet showing TCPIP shutdown:

20:04:39 DTCCTC095E Sense id: FF 30 88 for device E200
20:04:39 DTCCTC095E Sense id: FF 30 88 for device E201
20:04:39 DTCSTM213I Telnet server: Global connection to *CCS CP System Service established
20:04:39 DTCSTM216I Telnet server: First line of *CCS logo is: z/VM ONLINE
         --CPZVM71B--PRESS BREAK KEY TO BEGIN SESSION
-------------------------------------------------
   Telnet and the Local VM Network is UP and READY
 -------------------------------------------------

20:05:01 DTCMON442I CP EXT issued by OPERATOR; results pending
20:05:01 DTCMON443I CP EXT issued by OPERATOR; command successful
                              ***** 03/09/21 *****
20:05:01 DTCSHU004E
20:05:01 DTCSHU009E 6 active clients, with 2 connections in use.
20:05:01 DTCSHU010E I will delay shutting down for 30 seconds, so that
20:05:01 DTCSHU011E RSTs and shutdown notifications may be delivered.
20:05:01 DTCSHU012E If you wish to shutdown immediately, without warning,
20:05:01 DTCSHU013E type #CP EXT again.
20:05:01 DTCSHU014I
20:05:01 DTCSTM119I Server Telnet closed down. Bye.
gjcullen commented 3 years ago

Fish, I created an OBEYFILE for TCPIP to surgically stop the channel-to-channel adapter. Now with TCPIP in the initialized and fully "up" state (as opposed to shutting down), if I execute the OBEYFILE, an IFI004 CP Abend will immediately follow. Same on both zVM54 and zVM71.

Geoff

Fish-Git commented 3 years ago

Okay, I managed to modify my existing z/VM 7.1 system to use CTCI, and once again, I experienced zero problems shutting down TCPIP. No IFI004. I even re-started TCPIP by logging completely off of the TCPIP user ID and then logging back on again, and sure enough, TCPIP networking came up just fine and everything worked.

Again, this is on Windows, where networking is handled completely differently than on Linux, so I can only guess that whatever the problem is that's causing your IFI004. it's unique to Linux tuntap only. Someone else more experienced with Linux tuntap networking and Hercules is going to have to take over from here. I don't think there's anything more I can do.   :(


Here's my Hercules CTCI statement:

0E20.2  CTCI  -m 02-00-5E-A8-00-47  192.168.0.71  192.168.0.107

Here's my TCPMAINT's PROFILE TCPIP and SYSTEM DTCPARMS files:

(profile tcpip):

00001 ; ----------------------------------------------------------------------
00002 ; - PROFILE TCPIP created by DTCIPWIZ EXEC on 11 Jun 2020
00003 ; - Configuration program run by MAINT710 at 18:21:02
00004 ; ----------------------------------------------------------------------
00005 ;   %%File Origin Indicator - DO NOT REMOVE OR ALTER the next line%%
00006 ;   %%TCPIP%%PROFILE%%STCPIP%%
00007 ; ----------------------------------------------------------------------
00008 OBEY
00009 OPERATOR TCPMAINT MAINT MPROUTE REXECD SNMPD SNMPQE LDAPSRV MAINT710
00010 ENDOBEY
00011 ; ----------------------------------------------------------------------
00012 PORT
00013 ; 20   TCP FTPSERVE  NOAUTOLOG ; FTP Server
00014 ; 21   TCP FTPSERVE            ; FTP Server
00015   23   TCP INTCLIEN            ; TELNET Server
00016 ; 25   TCP SMTP                ; SMTP Server
00017 ; 111  TCP PORTMAP             ; Portmap Server
00018 ; 111  UDP PORTMAP             ; Portmap Server
00019 ; 161  UDP SNMPD               ; SNMP Agent
00020 ; 162  UDP SNMPQE              ; SNMPQE Agent
00021 ; 389  TCP LDAPSRV             ; LDAP Server
00022 ; 389  UDP LDAPSRV             ; LDAP Server
00023 ; 512  TCP REXECD              ; REXECD Server (REXEC)
00024 ; 514  TCP REXECD              ; REXECD Server (RSH)
00025 ; 520  UDP MPROUTE   NOAUTOLOG ; Multiple Protocol Routing Server
00026 ; 608  TCP UFTD                ; UFT Server
00027 ; 636  TCP LDAPSRV             ; LDAP Server (Secure)
00028 ; 636  UDP LDAPSRV             ; LDAP Server (Secure)
00029 ; 2049 UDP VMNFS               ; NFS Server
00030 ; 2049 TCP VMNFS     NOAUTOLOG ; NFS Server
00031 ; ----------------------------------------------------------------------
00032 DEVICE CTCDEV  CTC   0E20
00033 LINK   CTCLINK CTC 0 CTCDEV
00034 ; (End DEVICE and LINK statements)
00035 ; ----------------------------------------------------------------------
00036 ; ----------------------------------------------------------------------
00037 HOME
00038 192.168.0.71 255.255.0.0 CTCLINK
00039 ; (End HOME Address information)
00040 ; ----------------------------------------------------------------------
00041 GATEWAY
00042 ; Network       Subnet          First           Link             MTU
00043 ; Address       Mask            Hop             Name             Size
00044 ; ------------- --------------- --------------- ---------------- -----
00045 DEFAULTNET                      192.168.0.99    CTCLINK          1492
00046 ; (End GATEWAY Static Routing information)
00047 ; ----------------------------------------------------------------------
00048 START CTCDEV
00049 ; (End START statements)
00050 ; ----------------------------------------------------------------------

(system dtcparms):

00001 .**********************************************************************
00002 .* SYSTEM DTCPARMS created by DTCIPWIZ EXEC on 11 Jun 2020
00003 .* Configuration program run by MAINT710 at 18:20:48
00004 .**********************************************************************
00005 :nick.TCPIP    :type.server
00006                :class.stack
00007                :attach.0E20-0E21

Here's me issuing CP EXT from the TCPIP user id to shutdown TCPIP networking:

CP EXT
                              ***** 03/10/21 *****
17:08:20 DTCSHU004E
17:08:20 DTCSHU009E 3 active clients, with 1 connection in use.
17:08:20 DTCSHU010E I will delay shutting down for 30 seconds, so that
17:08:20 DTCSHU011E RSTs and shutdown notifications may be delivered.
17:08:20 DTCSHU012E If you wish to shutdown immediately, without warning,
17:08:20 DTCSHU013E type #CP EXT again.
17:08:20 DTCSHU014I
17:08:20 DTCSTM119I Server Telnet closed down. Bye.
17:08:50 DTCCTC082E CTCA shutting down:
17:08:50 DTCPRI385I    Device CTCDEV:
17:08:50 DTCPRI386I       Type: CTC, Status: Ready
17:08:50 DTCPRI387I       Envelope queue size: 0
17:08:50 DTCPRI388I       Address: 0E20
17:08:50 DTCPRI280I ShutDown at 197.838 seconds
DTCRUN1014I Server ended normally at 17:08:50 on 10 Mar 2021 (Wednesday)
DTCRUN1019I Server will not be logged off because you are connected
CON FILE 0001 SENT TO   TCPMAINT RDR AS  0036 RECS 0100 CPY  001 T NOHOLD NOKEEP
Ready; T=2.31/2.66 17:08:51

logoff

Here's what appears on the operator console:

17:08:12 GRAF  0460 RECONNECT  TCPIP    USERS = 10    BY IBMVM1
17:10:34 GRAF  0460 LOGOFF AS  TCPIP    USERS = 9
17:10:34 CTCA  0E20 DETACHED TCPIP    0E20 BY TCPIP
17:10:34 CTCA  0E21 DETACHED TCPIP    0E21 BY TCPIP

  _Again, no IFI004 at all. System stays up and running just fine._


Here's me logging onto user id TCPIP again to restart TCPIP:

DMSWSP100W Shared Y-STAT not available
DMSACP723I D (198) R/O
DMSACP723I E (591) R/O
DMSACP723I F (592) R/O
Ready; T=0.05/0.07 17:12:26
DTCRUN1022I Console log will be sent to default owner ID: TCPMAINT
DTCRUN1046I Using TCP/IP data file: TCPIP DATA F1
DTCRUN1046I Using 'server' definition (TCPIP) from file: SYSTEM DTCPARMS D1
DTCRUN1046I Using 'class' definition (STACK) from file: IBM DTCPARMS E1
DTCRUN1096I STORAGE = 128M
DTCRUN1038I Server is not configured to support secure connections
DTCRUN1021R To cancel TCP/IP Stack startup, type any non-blank character and
            press ENTER. To continue startup, just press ENTER.

DTCRUN1011I Server started at 17:12:30 on 10 Mar 2021 (Wednesday)
DTCRUN1011I Running server command: TCPIP
DTCRUN1011I No parameters in use
DTCRUN1048I SLVL service information for: TCPIP MODULE E2
DTCRUN1048I CMHOSTN  ZVM620    CMPRCOM  PI40702   CMTCPPR  PI21053
DTCRUN1048I CMUNTOK  ZVM620    FPIPDOW  PH01192   FPI6DOW  PH01192
DTCRUN1048I FPI6UP   PI08319   FPNOTIF  PH01192   FPPARSE  ZVM630
DTCRUN1048I FPQDIO   PI73016   FPQUEUE  PH01192   FPROUND  PH01192
DTCRUN1048I FPSCHED  PH01192   FPSOCKRE PH01192   FPTCPDOW PH01192
DTCRUN1048I FPTCPREQ PH01192   FPTCPUP  PH01192   FPTIMER  ZVM620
DTCRUN1048I FPTOCLAW PQ95126   FPTOCTC  PQ70168   FPTOPC3  PQ59871
DTCRUN1048I FPTREE   ZVM620    FPUTIL   PH01192   F6TCPUP  PH01192
DTCRUN1048I TCARP    ZVM710    TCCONFIG ZVM630    TCCONSI  ZVM620
DTCRUN1048I TCDIP98  PQ19769   TCFPSM   ZVM620    TCFR182  ZVM620
DTCRUN1048I TCINT    MT06108   TCIPDOW  ZVM630    TCIPUP   PI08319
DTCRUN1048I TCMIB    PI99184   TCMON    PH01192   TCMPRIO  ZVM620
DTCRUN1048I TCNED    PI08319   TCNOTIF  ZVM630    TCOFFLOP PQ21340
DTCRUN1048I TCPARSE  PI99184   TCPDOWN  ZVM620    TCPERUP  PQ49456
DTCRUN1048I TCPEXITS ZVM620    TCPIP    PI99184   TCPIPIN  ZVM710
DTCRUN1048I TCPKT    ZVM620    TCPREQU  PH00032   TCPRINT  PH01192
DTCRUN1048I TCPSSL   PH01192   TCPUP    PH01192   TCQDIO   PI21053
DTCRUN1048I TCQUEUE  PH01192   TCROUND  ZVM620    TCSOCKRE PI97021
DTCRUN1048I TCSYSIO  PQ51738   TCTOATM  ZVM620    TCTOCLAW PQ95126
DTCRUN1048I TCTOCTC  ZVM620    TCTOHPPI PQ51891   TCTOIUC  ZVM630
DTCRUN1048I TCTOOSD  PI52850   TCTOPC3  PQ66267   TCTREEP  PI68525
DTCRUN1048I TCUDPRE  ZVM620    TCUDPUP  ZVM630    TCUTIL   PI99184
DTCRUN1048I TCVMSUB  ZVM620    TNLDSFP  PI38762   TNSTMAS  PI99184
DTCRUN1048I TNTOCP   PI25555   TNTOTCP  PI99184   T6PREQU  PM96884
DTCRUN1048I T6PSSL   PI97021   T6SOCKRE ZVM650
DTCTCP001I z/VM TCP/IP Level 710
                              ***** 03/10/21 *****
17:12:30 DTCIPI008I    Initializing... TCPIP MODULE E2 dated 03/13/19 at 16:49
17:12:30 DTCIPI052I TCP/IP Module Load Address:  00BD1000
17:12:30 DTCIPI009I Devices will use diagnose 98 real channel program support
17:12:30 DTCIPI012I TCP/IP running under z/VM system
17:12:30 DTCIPI005I Trying to open TCPIP TCPIP *
17:12:30 DTCIPI005I Trying to open ZVM71 TCPIP *
17:12:30 DTCIPI005I Trying to open PROFILE TCPIP *
17:12:30 DTCIPI006I Using profile file PROFILE TCPIP D1 dated 03/10/21 at 16:33
17:12:31 DTCFPM004I FPSM is available with 0 pages allocated
17:12:31 DTCFPM005I FPSM may allocate an additional 23348 pages out of 25943
17:12:31 DTCCTC080I CTCA initializing:
17:12:31 DTCPRI385I  Device CTCDEV:
17:12:31 DTCPRI386I     Type: CTC, Status: Not started
17:12:31 DTCPRI387I     Envelope queue size: 0
17:12:31 DTCPRI388I     Address: 0E20
17:12:32 DTCPUP096I Precedence designations will be overridden
17:12:32 DTCPDO083I Will honor ICMP redirects
17:12:32 DTCPDO226I Will honor ICMPv6 redirects
17:12:32 DTCPDO150I Support for IPv4 Path MTU Discovery is disabled
17:12:32 DTCIPU044I IP forwarding is enabled
17:12:32 DTCUDP010I Limit on incoming UDP datagram queue size enabled
17:12:53 DTCIPI035I TCP/IP will not dump if a program check occurs
17:12:53 DTCIPI036I Access to TCP/IP services is not restricted
17:12:53 DTCIPI033I Only users in the obey list and users who have a port explic
itly reserved may use ports 1-1023
17:12:53 DTCIPI038I Internal consistency checking is disabled
17:12:53 DTCIPI039I Support for RFC 1323 is enabled
17:12:53 DTCSSL037I Local connections will not be protected by SSL
17:12:53 DTCIPI023I TCP-IP initialization complete.
17:12:53 DTCSTM237I Telnet server: Using port 23
17:12:53 DTCSTM298I Telnet server: Line mode input will be presented when reques
ted
17:12:53 DTCSTM296I Telnet server: Erase All Unprotected (EAU) data will be tran
smitted
17:12:53 DTCSTM239I Telnet server: No inactivity timeout
17:12:53 DTCSTM240I Telnet server: Every 600 seconds a timing mark option packe
 will be sent.
17:12:53 DTCSTM299I Telnet server: EOJTIMEOUT is 120 seconds
17:12:53 DTCSTM269I Telnet server: SCANInterval is 60 seconds
17:12:53 DTCSTM255I Telnet server: Suppress-Go-Ahead enabled
17:12:53 DTCSTM291I Telnet server: TN3270E is enabled
17:12:53 DTCOTC050I ConnectExit is disabled for TN3270E sessions
17:12:53 DTCSTM256I Telnet server: Line-mode terminal names will be prefixed wit
h "TCPIP"
17:12:53 DTCSTM263I Telnet server: Will use logical device addresses in the rang
e 00000000 through 00000FFF
17:12:53 DTCSTM305I Telnet server: Secure Connections are NOT ALLOWED
17:12:53 DTCSTM313I Telnet server: TLSLABEL is ignored when the Secure Connectio
ns value is NEVER.
17:12:53 DTCSTM243I ******************************************
17:12:53 DTCSTM244I Log of IBM TCP/IP Telnet Server Users started on 03/10/21 at
 17:12:53
17:12:53 DTCSTM245I
17:12:53 DTCCTC095E Sense id: FF 30 88 for device 0E20
17:12:53 DTCSTM213I Telnet server: Global connection to *CCS CP System Service e
stablished
17:12:53 DTCCTC095E Sense id: FF 30 88 for device 0E21
17:12:53 DTCSTM216I Telnet server: First line of *CCS logo is: z/VM ONLINE
         --ZVM71   --PRESS BREAK KEY TO BEGIN SESSION

CP DISC
DISCONNECT AT 17:13:55 PST WEDNESDAY 03/10/21

Press enter or clear key to continue

  _And again, no IFI004 at all. System stays up and running just fine, and all networking functionality appears to be fully functional and once again working fine._

Fish-Git commented 3 years ago

Geoff,

THREE things:

  1. I noticed your Directory entry (USER DIRECT file) for IDENTITY TCPIP, you have two "dedicate" statements for your Hercules CTCI device:
    IDENTITY TCPIP XXXXXXXX 128M 256M BG   
    INCLUDE TCPCMSU                                               
    BUILD ON * USING SUBCONFIG TCPIP-1            
    IUCV ALLOW                                                         
    IUCV ANY PRIORITY                                              
    IUCV *CCS PRIORITY MSGLIMIT 255                                
    IUCV *VSWITCH MSGLIMIT 65535                                   
    MACHINE XA                                                          
    OPTION QUICKDSP SVMSTAT MAXCONN 1024 DIAG98 APPLMON
    SHARE RELATIVE 3000                                                 
    DEDICATE E200 E200
    DEDICATE E201 E201
    CONSOLE 0009 3215 T OPERATOR 
    

    I don't believe these are needed. Instead, I believe the devices are automatically attached to user TCPIP via the SYSTEM DTCPARMS file's ":attach." statement. (Refer to my SYSTEM DTCPARMS file in my previous comment further above)

Have you tried removing them?  

  1. I noticed your PROFILE TCPIP file has an extra statement between the GATEWAY statement and the DEFAULTNET statement that my own PROFILE TCPIP file does not have:

    [...]
    GATEWAY
    ;
    ; (IP) Network     Subnet                First                    Link               MTU
    ; Address            Mask                   Hop                     Name             Size
    ; -----------             -------------            ---------------          ----------           -----
    192.168.100.2 255.255.255.0   =                        CTCLINK1      1500
    ;
    DEFAULTNET                    192.168.100.2   CTCLINK1   1500
    
    START CTCDEV1
    

In my own PROFILE TCPIP file, the DEFAULTNET statement immediately follows my GATEWAY statement, without any statements in between them (other than comments of course!).

Have you tried removing that extra unneeded statement?  

  1. You said:

... if I execute the OBEYFILE, an IFI004 CP Abend will immediately follow. Same on both zVM54 and zVM71.

I am not familiar with this.

Can you please post a copy of this OBEYFILE thing of yours along with instructions on how I should use it? Thanks!

That's all for now.

Fish-Git commented 3 years ago

Something else:

Do you have any type of OSTAILOR statement in your Hercules configuration file? Is it set to VM? Or some other value? It should be OSTAILOR VM.

To be safe, may I see your ENTIRE Hercules configuration file please?

May I also see your ENTIRE Hercules logfile when the IFI004 occurs?

Please upload or attach them both to your next GitHub Issue comment please. Thanks.

  Another thing:

Have you tried entering the Hercules commands t+E200 and t+E201 immediately before you try shutting down TCPIP? (i.e. immediately before you do whatever it is that causes your IFI004?) This will create a Hercules trace of all I/O to your CTCI device that might provide a clue as to what may be going wrong.

Thanks.

That's all I can think of right now.

gjcullen commented 3 years ago

Fish, I will get all the information together and send you. I may not get to the task for a few days. Geoff

Fish-Git commented 3 years ago

No hurry.

gjcullen commented 3 years ago

Fish,

Here is the information you requested. I am using a zVM5.4 image:

  1. I use DEDICATE statements in the VM Directory for the CTCA instead of using DTCPARMS so that the TCPIP machine can run without the need of A and B privilege classes.

  2. The statement in the TCPIP config between GATEWAY and DEFAULTNET is to support the routing to the outside world, this one is for the first hop.

  3. The OBEY file is named KILLTCP TCPIP (fm). Its contents are STOP CTCDEV1. It resides on TCPMAINT's 592 disk. It is activated by NETSTAT KILLTCP TCPIP B from a user which is authorized in the TCPIP config to issue an OBEYFILE. The filemode is based upon the way that the TCPIP minidisks are accessed.

  4. OSTAILOR is set to VM

  5. The Hercules configuration follows and the log will be uploaded:

#--------------------------------------------------------
# CPU Configuration   (for zVM5.4)
#----------------------------------------------------------
ARCHLVL         z/ARCH          # Set Architecture level
FACILITY        ENABLE bit44
FACILITY        ENABLE BIT45
FACILITY        ENABLE BIT37
#FACILITY       ENABLE SIGP_SETARCH_S370
FACILITY        QUERY ALL

#CAPPING        OFF
#CPUIDFMT       BASIC
CPUMODEL        789A            # CPU Model Identification (NETID)
CPUSERIAL       002855          # CPU Serial Identification
CPUVERID        00              # CPU Version Identification
#DEFSTORE       MAIN 2G         # Define storage
MODEL           EMULATOR        # STSI returned model
PLANT           ZZ              # STSI returned plant
MANUFACTURER    HRC             # STSI returned manufacturer
LPARNAME        HERCULES        # DIAG 204 returned lparname
ENGINES         CP              # CPU types
MAINSIZE        2048M           # Main storage size in megabytes
XPNDSIZE        0               # Expanded storage size in megabytes
MAXCPU          1               # Maximum number of CPUs
NUMCPU          1               # Number of CPUs
MAXRATES        1440            # Maximum mips/sio rate
ECPSVM          NO              # VM Assist : NO or Level (20 recommend)
LPARNUM         00              # LPAR ID
TIMERINT        50              # Interval Timer update in microsec
TODDRAG         1.0000          # TOD clock drag factor
#----------------------------------------------------------
# OS Tailoring
#----------------------------------------------------------
PGMPRDOS        LICENSED        # Allow
LOADPARM        001F
OSTAILOR        VM              # OS tailoring
SYSEPOCH        1900            # Base year for initial TOD clock
TZOFFSET        +0000           # Using UTC (GMT)
#IODELAY        1000            # I/O Delay
#LOGOPTS        TIMESTAMP       # Timestamp all log messages
#----------------------------------------------------------
# Hercules Service Processor and Hercules Application Window
#----------------------------------------------------------

# MODPATH /usr/local/lib/hercules # Where to search for modules
# LDMOD dyninst tcpip # Modules to be loaded

#----------------------------------------------------------
# Hercules Service Processor and Hercules Application Window
#----------------------------------------------------------
CODEPAGE        default # CodePage conversion table
SHCMDOPT        DISABLE NODIAG8         # No shell commands from 
#DIAG8CMD       ENABLE ECHO                 # OS may issue commands via DIAG 8
DIAG8CMD        DISABLE NOECHO

# HTTPPORT 8081 noauth userid password # HTTP server port
# HTTPROOT /usr/local/share/hercules/ # HTTP root directory
PANOPT          RATE=400       # Panel refresh rate in msec

#----------------------------------------------------------
# Hercules 3270 Consoles
#----------------------------------------------------------
CNSLPORT        3270           # TCP port number to which consoles connect
CONKPALV        (3,1,10)       # Console recovery periods.
#----------------------------------------------------------
# Advanced Hercules Performance Tailoring
#
# +----------------------------------------------------------+
# | Caution: Modification of these parameters may adversely |
# | affect the performance of the host system |
# | and/or Hercules. |
# +----------------------------------------------------------+
#----------------------------------------------------------
# HERCPRIO 5                    # Hercules process runs at Normal priority
# CPUPRIO 2                     # CPU thread(s) run at Low priority
# SRVPRIO 4                     # TCP/IP Server Thred(s), HTTP, CONSOLE, etc.
# DEVPRIO 8                     # Device thread(s) run at Below Normal priority
# TODPRIO -20                   # TOD Clock and timer thread are Time Critical

#----------------------------------------------------------
# Integrated Hercules I/O Controller
#

# DEVTMAX 8              # Device threads: -1=exit on complete, 0=unlimited
# SHRDPORT      3990            # TCP port number for sharing DASD images  on
# this instance of Hercules (inactive)

# .-----------------------Device number
# | .-----------------Device type
# | | .---------File name and parameters
# | | |
# V V V
# ---- ---- --------------------
0000 SYSG SYSGCONS
#0009 3215-C / noprompt
000C 3505 /home/maint/hercules/reader00C.txt ascii trunc
000D 3525 /home/maint/hercules/punch00D.txt ascii
000E 1403 /home/maint/hercules/print00E.txt crlf
001F 3270
0020 3270
0021 3270
0022 3270
0023 3270
0024 3270
0025 3270
0026 3270
0027 3270
0028 3270
0029 3270
002A 3270
002B 3270
002C 3270
002D 3270
002E 3270
002F 3270

0380 3490 /home/maint/hercules/3490.TAPE0.aws
0381 3490 /home/maint/hercules/3490.TAPE1.aws
#0580 3490 /cdrom/tapes/*
#0581 3490 /home/maint/Downloads/cpdvd/540vm.ins
#0582 3490 /home/maint/Downloads/cpdvd/
#0583 3490 /home/maint/Downloads/zvm54rsu.iso

0123 3390    /run/media/maint/VMDASD/540RES.0123
0124 3390    /run/media/maint/VMDASD/540SPL.0124
0125 3390    /run/media/maint/VMDASD/540PAG.0125
0126 3390    /run/media/maint/VMDASD/540USR.0126
0127 3390    /run/media/maint/VMDASD/540WK1.0127
0128 3390    /run/media/maint/VMDASD/54TDSK.0128

#E200-E201 CTCI /dev/net/tun 1500 192.168.100.1  192.168.100.2  255.255.255.0
E200.2 CTCI /dev/net/tun 1500 192.168.100.1  192.168.100.2  255.255.255.0
  1. Log showing the IFI004 CP Abend uploaded as zvm54cpa.log:

Respectfully, Geoff

gjcullen commented 3 years ago

Will be sending a hercules trace of E200-E201 CTCA pair.

Geoff

Fish-Git commented 3 years ago

As I recall, I had much trouble with the z/VM 5.3 Evaluation Edition when 2GB or more of MAINSIZE was specified in Hercules. I had to reduce my Hercules MAINSIZE to a value less than 2GB (e.g. 2000MB or 1900MB, etc):

MAINSIZE        1900M           # Main storage size in megabytes

Have you tried that yet?

Also, it there a reason why you're using CTCI instead of LCS or QETH/OSA for your networking? Just curious! I've personally always used OSA for all of my networking needs, including z/VM. Sometimes I use LCS, but I rarely use CTCI. Have you tried using LCS yet? Does it behave any better?

  1. I use DEDICATE statements in the VM Directory for the CTCA instead of using DTCPARMS so that the TCPIP machine can run without the need of A and B privilege classes.

Have you tried adding privilege classes A and B to TCPIP? As far as I know ABG are the standard default privilege classes for TCPIP. I personally would be very reluctant to remove privilege classes from users which IBM feels needs them. <shrug>

  1. The statement in the TCPIP config between GATEWAY and DEFAULTNET is to support the routing to the outside world, this one is for the first hop.

Isn't your DEFAULTNET statement accomplishing the same thing? I mean, I could understand your argument if the IP addresses (i.e. networks, i.e. subnets) were different (thereby requiring special routes), but since they're the same I don't think it's needed. My system doesn't have any such statement(s) and my z/VM system can reach the outside world just fine. <shrug>

I also noticed in the Hercules log you attached:

18:38:34 HHC00109E set_thread_priority( 5 ) failed: Operation not permitted
18:38:34 HHC00007I Previous message from function 'impl' at impl.c(988)
18:38:34 HHC00110W Defaulting all threads to priority 0
18:38:34 HHC00007I Previous message from function 'impl' at impl.c(991)
18:38:34 HHC00100I Thread id 00007f893987b740, prio 0, name 'impl_thread' started
18:38:34 HHC90020W 'hthread_setschedparam()' failed at loc=impl.c:1010: rc=22: Invalid argument
18:38:34 HHC00007I Previous message from function 'hthread_set_thread_prio' at hthreads.c(1170)
 . . .
18:38:34 HHC00018W Hercules is NOT running in elevated mode
18:38:34 HHC00007I Previous message from function 'impl' at impl.c(1045)
18:38:34 HHC00150I Crypto module loaded (C) Copyright 2003-2016 by Bernard van der Helm
18:38:34 HHC00151I Activated facility: Message Security Assist
18:38:34 HHC00151I Activated facility: Message Security Assist Extension 1, 2, 3 and 4
18:38:34 HHC90020W 'hthread_setschedparam()' failed at loc=impl.c:431: rc=22: Invalid argument
18:38:34 HHC00007I Previous message from function 'hthread_set_thread_prio' at hthreads.c(1170)
18:38:34 HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
18:38:34 HHC90020W 'hthread_setschedparam()' failed at loc=cpu.c:2328: rc=22: Invalid argument
18:38:34 HHC00007I Previous message from function 'hthread_set_thread_prio' at hthreads.c(1170)
18:38:34 HHC00100I Thread id 00007f893932b700, prio 0, name 'Processor CP00' started
18:38:34 HHC90020W 'hthread_setschedparam()' failed at loc=timer.c:192: rc=22: Invalid argument
 . . .

On Linux, Hercules requires root in order to be able to properly set the dispatching priority of its internal threads. Is there a reason why you're not starting Hercules with elevated (root) privileges?

As I understand it, doing so should be safe since Hercules only uses it when needed (such as when setting thread priorities) but otherwise runs without root privileges. (It drops root almost as soon as it starts and only regains it again for those brief moments when it needs it, such as when setting thread priorities.)

These are just some things to try. You don't have to implement all of them. The one I would personally suggest trying first is reducing your Hercules MAINSIZE setting to 1900MB (i.e. something less than 2GB). I'm hoping that might resolve your problem altogether.

If that doesn't work, then you might try changing your networking device to LCS instead. That also might very well resolve your problem.

And finally, I don't recall asking this before (and please forgive me for asking it, but I'm just naturally incredibly curious!):

I there a good reason WHY you need to be stopping/re-starting TCPIP?!?

I mean, to me, once the system comes up and networking is initialized, why f**k with it?   :)

P.S. What type of machine is a model 789A?!  I've never heard of such a thing before!  What is it?  Have you tried specifying a z800 or z900, etc, CPU model number instead?

Fish-Git commented 3 years ago

P.P.S  The FACILITY statements you have in your configuration file are not needed. Facilities 37, 44 and 45 are all already enabled by default. They are accomplishing nothing.

Fish-Git commented 3 years ago

Another suggestion: rather than defining each 3270 device individually in your configuration file, you might condier using the device number range/count syntax instead, e.g.:

001F-002F  3270

Or:

001F     3270
0020.16  3270

That won't fix your problem of course, but it will make maintaining your configuration file a little bit easier.

Fish-Git commented 3 years ago

Will be sending a hercules trace of E200-E201 CTCA pair.

Still waiting.   (NO RUSH! Just thought I'd mention it since it's been a few days since you first wrote that.)

Also, have you had time to try any of my suggested experiments yet?

  1. Reducing Hercules MAINSIZE to something less than 2GB? (e.g. 1900M)
  2. Adding privilege classes A and B back to user TCPIP? (i.e. setting it to ABG like it should be?)
  3. Removing your (IMO) unneeded statement in your TCPIP PROFILE?
  4. Starting Hercules as root? (i.e. sudo)?
  5. Switching over to using LCS devices for networking instead of CTCI devices?

Does doing any of the above resolve your IFI004 problem?

Fish-Git commented 3 years ago

Geoff,     (@gjcullen)

Upon reviewing this issue, I happened to notice that I only tested the #CP EXT technique of stopping TCPIP (which worked fine), but I had not yet tried your "OBEY" file technique (which I admit to not being familiar with).

So I decided to give it a try, but was unfortunately unable to do so due to an apparent z/VM (configuration? installation?) error.

In an earlier comment, you said:

  1. The OBEY file is named KILLTCP TCPIP (fm). Its contents are STOP CTCDEV1. It resides on TCPMAINT's 592 disk. It is activated by NETSTAT KILLTCP TCPIP B from a user which is authorized in the TCPIP config to issue an OBEYFILE. The filemode is based upon the way that the TCPIP minidisks are accessed.

So the first thing I of course did, was to try to logon to TCPMAINT. But I couldn't! Each time I would try, I would get the following error:

Enter one of the following commands:

   LOGON userid             (Example:  LOGON VMUSER1)
   DIAL userid              (Example:  DIAL VMUSER2)
   MSG userid message       (Example:  MSG VMUSER2 GOOD MORNING)
   LOGOFF
logon tcpmaint by ibmvm1
ENTER PASSWORD  (IT WILL NOT APPEAR WHEN TYPED):

z/VM Version 7 Release 1.0, Service Level 1901 (64-bit),
built on IBM Virtualization Technology
There is no logmsg data
FILES: 0038 RDR,   NO PRT,   NO PUN
LOGON AT 05:48:19 PDT SATURDAY 03/27/21
DMSACC724I 19E replaces Y (19E)
DMSACP723I Y (19E) R/O
DMSPAR280E Application     not active
 05:48:19  * MSG FROM TCPMAINT: DMSINS2563T RTNLOAD failed for VMMTLIB
HCPGIR450W CP entered; disabled wait PSW 000A0000 80021528

And trying to re-ipl CMS does no good either:

ipl cms
DMSACC724I 19E replaces Y (19E)
DMSACP723I Y (19E) R/O
DMSPAR280E Application     not active
 06:27:35  * MSG FROM TCPMAINT: DMSINS2563T RTNLOAD failed for VMMTLIB
HCPGIR450W CP entered; disabled wait PSW 000A0000 80021528

  I find this to be quite interesting, and very possibly a strong clue(?) perhaps as to where/what the actual root problem might be? *`()`**

So I now have to ask the obvious question: Does this same thing happen to you? Are you able to logon to your TCPMAINT userid on your own z/VM system? (e.g. z/VM 7.1, but knowing whether or not the problem also occurs on z/VM 5.4 too would also be interesting to know.)

Because without being able to logon to TCPMAINT, I am unable to try your "OBEY" file technique of shutting down TCPIP.

Sorry!   :(

  *`()`** Unfortunately however, being very unskilled at modern z/VM, I have no freaking clue as to where the problem might be nor how to go about fixing it. Maybe you or someone else more skilled with modern z/VM can figure it out?

Fish-Git commented 3 years ago

UPDATE:

Apparently I am unable to logon to ANY z/VM userid at all!! So far I've tried MAINT710, plain old MAINT, and even ZOS, and they ALL encounter the same error, dropping me into a disabled wait.   :(

Fish-Git commented 3 years ago

Another quick update:

I did some quick Googling and found the following:

Does this make sense to anyone?

mcisho commented 3 years ago

Sounds to me like one of your VM disk images has become corrupted, in particular the image containing the S-disk minidisk. I would suggest restoring your VM system.

Fish-Git commented 3 years ago

Sounds to me like one of your VM disk images has become corrupted ...

I find that to be highly unlikely since I use shadow files, starting each test from the exact same known-to-be-stable(?) starting point each time.

Nevertheless, since I admit to having never actually tested logging onto any z/VM userid after having modified my existing LCS configuration to use CTCI instead, I suppose there is the possibility that I missed something (did something wrong) and it has always been broken. *`()`**

So I'll revert back to my known working LCS configuration and make the same changes again as I did last time to convert it into a CTCI configuration instead, but this time verifying afterwards that logging onto z/VM userids still works.

Thanks for the feedback.

  *`()` _This is an ADCD version of z/VM 7.1 that came pre-configured for OSA after all, and I admit to never having ever tried to manually change it to using a completely different networking device before. As I recall, whenever I originally installed my z/VM 6.3 system (which was not an ADCD), I had to specify at install time** what type of networking device I wanted to use, thus leading me to believe that one cannot simply switch to using a completely different networking device without knowing exactly what one was doing (i.e. without manually modifying a lot of rather gnarly TCPIP related files spread all over the place)._

Rather, what I did with my z/VM 6.3 system, is install it specifying CTCI as my network device, verify it worked okay, then save those shadow files and then reset back to the original virgin uninstalled state and start another brand new install again from the very beginning, this time specifying LCS as my networking device, saving those shadow files, etc, for each type of installation type that z/VM supported (i.e. CTCI, LCS, OSA in level-2 mode and OSA in level-3 mode). I guess I should have used my existing z/VM 6.3 CTCI system for my tests rather than to have tried manually re-configuring my existing pre-install z/VM 7.1 system like I did. That was probably my mistake. I'll try to recreate Geoff's crash using my existing z/VM 6.3 CTCI system instead this time. In retrospect it was a mistake for me to try and do it with my z/VM 7.1 system.

Fish-Git commented 3 years ago

Okay, z/VM 6.3 works fine using CTCI.

I logged onto user TCPMAINT and created a "KILLTCP TCPIP F" file (vaddr 592 was my F drive) consisting of a single line:

STOP CTCDEV1

But when I try using it (in this case from userid MAINT630), I get an error:

msg * xxx
 11:12:51  * MSG FROM MAINT630: XXX
Ready; T=0.01/0.01 11:12:51
q disk *
LABEL  VDEV M  STAT   CYL TYPE BLKSZ   FILES  BLKS USED-(%) BLKS LEFT  BLK TOTAL
MNT191 191  A   R/W   175 3390 4096       47        411-01      31089      31500
MNT5E6 5E6  B   R/W     9 3390 4096      134       1304-80        316       1620
MNT2CC 2CC  C   R/W    10 3390 4096        5        184-10       1616       1800
MNT51D 51D  D   R/W    26 3390 4096      297       1821-39       2859       4680
PMT551 551  E   R/W    40 3390 4096       10        119-02       7081       7200
MNT190 190  S   R/O   207 3390 4096      701      18090-49      19170      37260
TCM592 592  T   R/O   140 3390 4096      858      11816-47      13384      25200
MNT193 193  W   R/O   500 3390 4096     1280      42715-47      47285      90000
MNT19D 19D  X/X R/O   292 3390 1024      404      61228-42      83312     144540
MNT19E 19E  Y/S R/O   500 3390 4096     1123      30403-34      59597      90000
PMTCF0 CF0  Z   R/W   120 3390 4096        3         23-01      21577      21600
Ready; T=0.01/0.01 11:12:56
type killtcp tcpip t

STOP CTCDEV1

Ready; T=0.01/0.01 11:13:07
netstat killtcp tcpip t
DTCNET103E Invalid parameter: KILLTCP
Ready(00001); T=0.03/0.04 11:13:19
netstat obey killtcp tcpip t
VM TCP/IP Netstat Level 630       TCP/IP Server Name: TCPIP

RDR FILE 0032 SENT FROM TCPIP    PUN WAS 0013 RECS 0006 CPY  001 A NOHOLD NOKEEP
DTCNET501E Configuration error. Details are in PROFILE TCPERROR.
Ready(00004); T=0.06/0.07 11:13:27
type profile tcperror

DTCPRS008E Error encountered parsing OBEY or generated command line:  KILLTCP TC
PIP
DTCPAR123I Unknown command: 'KILLTCP'
DTCPRS332E Remainder of command line skipped due to error
DTCPRS096I Options enabled after parsing OBEY command line:

Ready; T=0.01/0.01 11:13:42

  I'm obviously doing something wrong, but I don't know what!

This is all new to me, and I don't know what I'm doing!   :(

gjcullen commented 3 years ago

Fish,

I have attached the Hercules trace (t+E200 and t+E201) for the CTCs, along with the Hercules configuration file (zvm54cpa.txt).

The trace was initiated just prior to having an authorized user machine entering the command: NETSTAT OBEY STOP CTCDEV1.   (Issuing the STOP CTCDEV1 command directly instead of via an OBEYFILE is much simpler.)

The IFI004 ensued immediately:

Geoff

gjcullen commented 3 years ago

Fish,

Modify the OBEY statement in the TCPIP profile that you are running.
Add your userid or any userid from which you will issue the stop.

Recycle TCPIP to enable the new OBEY statement.

On the authorized userid type: NETSTAT OBEY STOP CTCDEV1 , assuming CTCDEV1 is the name of the connection.

This method is easier to reproduce the problem (if it does happen on your system).

Geoff

On 03/27/2021 02:17 PM, Fish-Git wrote:

Okay, z/VM 6.3 works fine using CTCI.

I logged onto user TCPMAINT and created a "KILLTCP TCPIP F" file (vaddr 592 was my F drive) consisting of a single line:

|STOP CTCDEV1 |

But when I try using it (in this case from userid MAINT630), I get and error:

|msg xxx 11:12:51 MSG FROM MAINT630: XXX Ready; T=0.01/0.01 11:12:51 q disk * LABEL VDEV M STAT CYL TYPE BLKSZ FILES BLKS USED-(%) BLKS LEFT BLK TOTAL MNT191 191 A R/W 175 3390 4096 47 411-01 31089 31500 MNT5E6 5E6 B R/W 9 3390 4096 134 1304-80 316 1620 MNT2CC 2CC C R/W 10 3390 4096 5 184-10 1616 1800 MNT51D 51D D R/W 26 3390 4096 297 1821-39 2859 4680 PMT551 551 E R/W 40 3390 4096 10 119-02 7081 7200 MNT190 190 S R/O 207 3390 4096 701 18090-49 19170 37260 TCM592 592 T R/O 140 3390 4096 858 11816-47 13384 25200 MNT193 193 W R/O 500 3390 4096 1280 42715-47 47285 90000 MNT19D 19D X/X R/O 292 3390 1024 404 61228-42 83312 144540 MNT19E 19E Y/S R/O 500 3390 4096 1123 30403-34 59597 90000 PMTCF0 CF0 Z R/W 120 3390 4096 3 23-01 21577 21600 Ready; T=0.01/0.01 11:12:56 type killtcp tcpip t STOP CTCDEV1 Ready; T=0.01/0.01 11:13:07 netstat killtcp tcpip t DTCNET103E Invalid parameter: KILLTCP Ready(00001); T=0.03/0.04 11:13:19 netstat obey killtcp tcpip t VM TCP/IP Netstat Level 630 TCP/IP Server Name: TCPIP RDR FILE 0032 SENT FROM TCPIP PUN WAS 0013 RECS 0006 CPY 001 A NOHOLD NOKEEP DTCNET501E Configuration error. Details are in PROFILE TCPERROR. Ready(00004); T=0.06/0.07 11:13:27 type profile tcperror DTCPRS008E Error encountered parsing OBEY or generated command line: KILLTCP TC PIP DTCPAR123I Unknown command: 'KILLTCP' DTCPRS332E Remainder of command line skipped due to error DTCPRS096I Options enabled after parsing OBEY command line: Ready; T=0.01/0.01 11:13:42 |

I'm obviously doing something wrong, but I don't know what! This is all new to me, and I don't know what I'm doing! |:(|

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SDL-Hercules-390/hyperion/issues/375#issuecomment-808780440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR2CVIJ3AT4JZICZ3AZFH5LTFYOMNANCNFSM4YV6VIBA.

Fish-Git commented 3 years ago

Geoff,

I just tried your new test (using netstat obey stop ctcdev1 command) just now, and my z/VM 6.3 system remained up and running just fine!   <shrug>

There must be something unusual about your particular z/VM system (a misconfiguration perhaps?) that is causing your IFI004, but I don't think it is Hercules. Whenever I try the exact same thing with my z/VM system (admittedly different from yours; mine is 6.3 whereas yours is 5.4), it works fine. There is no IFIF004 and my system remains up and running just fine.

I can stop and start my CTCI device at will (multiple times, whenever I want), and my system keeps chugging along just fine.

Of course, it could still be a bug in Hercules I suppose, but if it is, it's a bug specific to Linux tuntap handling, because on Windows. it works just fine. _(I'm not skilled enough to try the same thing on my KDE Neon 5.20.5 VMware virtual machine; I only use it for testing building Hercules, not for any serious *Nix Hercules debugging, as I don't have the Linux skills for that.)_

If you wish to pursue this problem/issue further, I'm afraid you are going to have to solicit the help of some other Hercules developer who is more experienced with Linux tuntap networking, not me. I don't have the necessary skills to debug this issue any further.

With that in mind, since I honestly do feel the problem lies with your z/VM guest system itself and not with Hercules, I am going to close this issue as "Unable to Reproduce".

If this does not satisfy you, please feel free to re-open this issue and ask a different Hercules developer to help you.

Sorry!   :(

jmaynard commented 3 years ago

I'm seeing this on VM/ESA 2.4 and OS/390 2.10, as well. In my case, I get the IFI004 whenever I stop TCP/IP on either VM or OS/390 running inside it. This is on my desktop Linux system. I use CTCI instead of LCS or OSA or whatever because using a tap interface on that box disrupts other things I do. (It screws up the Ethernet MAC address that programs see, and one in particular gets really confused when the MAC address changes.)

How would you suggest I approach it, not being a VM guy? I'm setting this system up to make sure I remember all the details of VTAM networking between the two systems, and also figuring out how to set up PVM and RSCS-JES2/NJE connections.

ivan-w commented 3 years ago

I'm still thinking this is due to a spurious interrupt (or lack thereof - although that would only cause the MIH procedure to be invoked) during HDV/HIO/HSCH - anyhow, any one of those functions that signals I/O termination. the new "CTCE" device type should be appropriate for PVM, RSCS/NJE, VTAM PU5 to PU5 (don't forget to enter the Enterprise key for CDRM to work) - also works for more modern z/VM Single System Image (the z/VM cluster thingy)

mcisho commented 3 years ago

These abends became regular occurrences for me following commit 5eb60c6 made by Fish on 3 Nov 2017, titled "Fix LCS, CTCI, CTCE and PTP Halt/Clear Subchannel handling". I have tried gathering evidence several times, but turning on traces anywhere naturally ensured the abend did not occur. I experience various symptoms, sometimes VM abends when I shutdown VM, sometimes VTAM abends when I shutdown z/OS, and sometimes VM hangs at the writing the checkpoint stage of shutdown.