SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

Multiple OSA IP support Ping to other-subnet VIPA expires in transit #204

Closed rgschmi closed 5 years ago

rgschmi commented 5 years ago

Environment is Windows 7, Hercules with multiple IP support and CTCI-WIN 3.7

My TCPIP config has an OSA with address 192.168.20.12 and a VIPA with address 10.0.0.2. I'm running OSPF, which shows my router as it's neighbor and that it is advertising the VIPA address. However when I ping the VIPA, the ping expires in transit:

EZZ7833I INTERFACE CONFIGURATION
IP ADDRESS      AREA             COST RTRNS TRDLY PRI HELLO  DEAD DB_E*
192.168.20.13   0.0.0.20            1     5     1   1    10    40    40
192.168.20.12   0.0.0.20            1     5     1   1    10    40    40
10.0.0.2        0.0.0.20            1   N/A   N/A N/A   N/A   N/A   N/A

ADVERTISED VIPA ROUTES
10.0.0.2       /255.255.255.255       <== advertising the VIPA
D TCPIP,TCPIP,OMPR,OSPF,NEIGHBOR
EZZ7851I NEIGHBOR SUMMARY 317
NEIGHBOR ADDR   NEIGHBOR ID     STATE  LSRXL DBSUM LSREQ HSUP IFC
192.168.20.100  10.0.0.100        128      0     0     0  OFF LNK3000   <== neighbor router OK
C:\Users\HP>ping 10.0.0.2

Pinging 10.0.0.2 with 32 bytes of data:
Reply from 192.168.20.1: TTL expired in transit.
Reply from 192.168.20.1: TTL expired in transit.
Reply from 192.168.20.1: TTL expired in transit.
Reply from 192.168.20.1: TTL expired in transit.

Ping statistics for 10.0.0.2:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

NOTE: This issue is closely related to Issue #203.

Fish-Git commented 5 years ago

My TCPIP config has an OSA with address 192.168.20.12 and a VIPA with address 10.0.0.2.

(Urk!) I didn't expect IP addresses that weren't in the same subnet to ever be assigned. My CTCI-WIN code isn't currently written to handle that situation. Providing such support will require a complete redesign. :(

Fish-Git commented 5 years ago

(I've added the "Ongoing" label since fixing this in CTCI-WIN is going to take me a while.)

Fish-Git commented 5 years ago

Reply from 192.168.20.1: TTL expired in transit.

Who is "192.168.20.1"? Is that your Windows host?

What does Windows ipconfig -all report?

What do your Hercules config file device statements look like for your OSA devices?

What does your z/OS TCPIP PROFILE looks like?

Can we see your Hercules log file? Are there any unusual messages there?

Do pings to the other IP addresses work okay?

So many questions! So few answers!

Fish-Git commented 5 years ago

EZZ7833I INTERFACE CONFIGURATION

ADVERTISED VIPA ROUTES

What commands are used to display this information?

Fish-Git commented 5 years ago
D TCPIP,TCPIP,OMPR,OSPF,NEIGHBOR
EZZ0059I DISPLAY COMMAND FAILED: OMPROUTE NOT ACTIVE

How do I enable OMPROUTE?

Fish-Git commented 5 years ago

What does Windows ipconfig -all report?

Also, what does Windows route print report?

Fish-Git commented 5 years ago

After pinging 192.168.20.12 and 192.168.20.13 and then pinging 10.0.0.2, what does the Windows arp -a command report?

rgschmi commented 5 years ago

Reply from 192.168.20.1: TTL expired in transit.

Who is "192.168.20.1"? Is that your Windows host?

Yes, that is the address of my Windows hose ethernet adapter.

What does Windows ipconfig -all report?

C:\Users\HP>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : HP-HP
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : Yes
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : nuinet.com

Ethernet adapter Local Area Connection 2:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Realtek PCIe GBE Family Controller
   Physical Address. . . . . . . . . : 68-1C-A2-12-B5-DC
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::ad30:62d8:1e6b:b588%19(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.20.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.20.100
   DHCPv6 IAID . . . . . . . . . . . : 493362338
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1F-EB-21-1E-2C-27-D7-2F-D1-D5
   DNS Servers . . . . . . . . . . . : 9.9.9.9
                                       8.8.8.8
   NetBIOS over Tcpip. . . . . . . . : Enabled

Tunnel adapter isatap.{25CBFE3D-0DA6-4E4A-BCBB-56ACC974794F}:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter Local Area Connection* 9:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft 6to4 Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter Local Area Connection* 12:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Teredo Tunneling Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

What do your Hercules config file device statements look like for your OSA devices?

INTERFACE        VLINK1                                                
     DEFINE      VIRTUAL                                               
     IPADDR      10.0.0.&WHO.                                          

INTERFACE        VLINK10                                               
     DEFINE      VIRTUAL                                               
     IPADDR      192.168.&ip..10                                       

INTERFACE        LNK3000                                               
   DEFINE        IPAQENET                                              
   PORTNAME      OSA3000                ; MUST MATCH TRLE PORT NAME    
   IPADDR        192.168.&IP..12/24     ; INTERFACE IP ADDRESS         
   SOURCEVIPAINT VLINK10                                               

INTERFACE        LNK3004                                               
   DEFINE        IPAQENET                                              
   PORTNAME      OSA3004                ; MUST MATCH TRLE PORT NAME    
   IPADDR        192.168.&IP..13/24     ; INTERFACE IP ADDRESS         
   SOURCEVIPAINT VLINK10                                               

What does your z/OS TCPIP PROFILE looks like?

Can we see your Hercules log file? Are there any unusual messages there?

I'll have to reproduce the problem for the last two questions. Stand by...

Do pings to the other IP addresses work okay?

So many questions! So few answers!

rgschmi commented 5 years ago

EZZ7833I INTERFACE CONFIGURATION

ADVERTISED VIPA ROUTES

What commands are used to display this information?

For this and the next question, you have to have OMPROUTE running and configured for OSPF. I think I included that output just to show that OSPF routing was working properly and that I did not have a routing issue. I can give you the JCL and configuration for OMPROUTE if you wish but you will have to be connected to a router that is running OSPF for anything to work. The neighbor listed in the output of the command is in fact the OSPF router and shows routing is working as intended.

The reason I run OSPF is because of dynamic VIPAs wherein an IP address can move from a host in one subnet to a host in another, which requires dynamic routing to change the next hop to the destination. I think the problem can be reproduced more simply by using static routing in the pinging workstation with the OSA address being the next hop to the VIPA, if that makes sense.

rgschmi commented 5 years ago

What does Windows ipconfig -all report?

C:\Users\HP>ipconfig -all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : HP-HP
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : Yes
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : nuinet.com

Ethernet adapter Local Area Connection 2:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Realtek PCIe GBE Family Controller
   Physical Address. . . . . . . . . : 68-1C-A2-12-B5-DC
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::ad30:62d8:1e6b:b588%19(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.20.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.20.100
   DHCPv6 IAID . . . . . . . . . . . : 493362338
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1F-EB-21-1E-2C-27-D7-2F-D1-D5
   DNS Servers . . . . . . . . . . . : 9.9.9.9
                                       8.8.8.8
   NetBIOS over Tcpip. . . . . . . . : Enabled

Tunnel adapter isatap.{25CBFE3D-0DA6-4E4A-BCBB-56ACC974794F}:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter Local Area Connection* 9:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft 6to4 Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter Local Area Connection* 12:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Teredo Tunneling Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Also, what does Windows route print report?

C:\Users\HP>ping 10.0.0.2

Pinging 10.0.0.2 with 32 bytes of data:
Reply from 192.168.20.1: TTL expired in transit.
Reply from 192.168.20.1: TTL expired in transit.
Reply from 192.168.20.1: TTL expired in transit.
Reply from 192.168.20.1: TTL expired in transit.

Ping statistics for 10.0.0.2:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss)

(Not sure why the large font. Appears small in edit mode)

C:\Users\HP>route print
Interface List
 19...68 1c a2 12 b5 dc ......Realtek PCIe GBE Family Controller
  1...........................Software Loopback Interface 1
 21...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter
 11...00 00 00 00 00 00 00 e0 Microsoft 6to4 Adapter
 12...00 00 00 00 00 00 00 e0 Microsoft Teredo Tunneling Adapter
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0   192.168.20.100     192.168.20.1    266
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    306
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    306
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    306
     192.168.20.0    255.255.255.0         On-link      192.168.20.1    266
     192.168.20.1  255.255.255.255         On-link      192.168.20.1    266
   192.168.20.255  255.255.255.255         On-link      192.168.20.1    266
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    306
        224.0.0.0        240.0.0.0         On-link      192.168.20.1    266
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    306
  255.255.255.255  255.255.255.255         On-link      192.168.20.1    266
===========================================================================
Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
          0.0.0.0          0.0.0.0   192.168.20.100  Default
===========================================================================

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    306 ::1/128                  On-link
 19    266 fe80::/64                On-link
 19    266 fe80::ad30:62d8:1e6b:b588/128
                                    On-link
  1    306 ff00::/8                 On-link
 19    266 ff00::/8                 On-link
===========================================================================
Persistent Routes:
  None
rgschmi commented 5 years ago

A tracert from 192.168.20.1 is interesting:

C:\Users\HP>tracert 10.0.0.2

Tracing route to 10.0.0.2 over a maximum of 30 hops

  1    <1 ms    <1 ms    <1 ms  192.168.20.100
  2    <1 ms    <1 ms    <1 ms  HP-HP [192.168.20.1]
  3    <1 ms    <1 ms    <1 ms  192.168.20.100
  4    <1 ms    <1 ms    <1 ms  HP-HP [192.168.20.1]
  5     1 ms 

192.168.20.100 is the default gateway for Windows.

However when I ping the linux VIPA, 10.0.0.4, from linux, 192.168.40.1, the ping works and traceroute shows the first hop is 10.0.0.4. Linux is using LCS adapters instead of OSA. 192.168.40.100 is the default gateway for linux.

Fish-Git commented 5 years ago

(OFF TOPIC)

(Not sure why the large font. Appears small in edit mode)

That's because GitHub uses what's known as "markdown" to format your comments:

The "=" equal signs in the text you pasted was interpreted as markdown. To prevent that, simply place 3 consecutive back-ticks before and after the text being pasted:

```
(pasted text goes here...)
(pasted text goes here...)
```

Or, simply indent (precede) each line being pasted with at least 4 spaces:

(pasted text goes here...)
(pasted text goes here...)

Either technique will prevent what's being pasted from being erroneously interpreted as markdown.

A single underline before/after a word makes it italic. Two asterisks makes it bold. Using both makes it bold italic, etc.

Download the above "Mastering Markdown" PDF and keep it handy. Once you start using it, it becomes much easier, since 99% of the time you only use the basic simple stuff (italic, bold, and 3 backticks for pasted text). The other markdown stuff you rarely use.

Hope that helps!

Fish-Git commented 5 years ago
C:\Users\HP>ipconfig -all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : HP-HP
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : Yes

This explains a lot: you have IP Routing enabled on your Windows host!

This would explain why you're getting multiple responses to your pings as well as the unusual tracert response you're getting.

Unless you're certain you need it (which I doubt you do), I would strongly suggest you disable it.

Things might work much better once you do.

Fish-Git commented 5 years ago

Okay, I've managed to get pinging to 10.0.0.2 to work for me by simply adding a route in my network router's routing table telling it to forward all packets destined to 10.0.0.2 to 192.168.0.4 (z/OS) instead.

For you, I guess that would be forward 10.0.0.2 to 192.168.20.12 (or 192.168.20.13, whichever you prefer).

Hopefully that should resolve this issue in totality.

rgschmi commented 5 years ago

I retried the cross-subnet ping to 10.0.0.2 after insuring I had the latest version of Hercules, and still no joy. Hope to have time to trace with Wireshark and internal TCPIP packet trace tonight. Following is the output of a netstat route command showing the routing for 10.0.0.2. I looks the same as the routing for 10.0.0.1 and 10.0.0.4, which are both linux LCS interfaces which respond to ping.

rgschmi commented 5 years ago

I ran a Wireshark trace of the ping failure. The ping source was the Hercules host machine, 192,168,20.1 trying to ping 10.0.0.2. I also ran a TCPIP packet trace. I'm attaching the Wireshark trace here, and can attach the TCPIP trace if requested, but while the TCPIP trace showed lots of OSPF packets sent and received, NO ICMP packets were being received by TCPIP. A sample packet trace from 192.168.20.1:

   5 NUI2     PACKET   00000004 00:16:18.676188 Packet Trace                        
From Interface    : LNK3000          Device: QDIO Ethernet    Full=61               
 Tod Clock        : 2019/05/08 00:16:18.676175                Intfx: 13             
 Segment #        : 0                Flags: Adj In                                  
 Source           : 192.168.20.1                                                    
 Destination      : 224.0.0.252                                                     
 Source Port      : 50892            Dest Port: 5355  Asid: 0025 TCB: 00000000      
 QID              : 1                                                               
IpHeader: Version : 4                Header Length: 20                              
 Tos              : 00               QOS: Routine Normal Service                    
 Packet Length    : 61               ID Number: 1C24                                
 Fragment         :                  Offset: 0                                      
 TTL              : 1                Protocol: UDP            CheckSum: E7E6 FFFF   
 Source           : 192.168.20.1                                                    
 Destination      : 224.0.0.252             

My router OSPF entries:

Selected Destination Next Hop Interface Route Type in FIB
Yes 10.0.0.0/24 192.168.10.12 eth1 ospf Yes
Yes 10.0.0.1/32 192.168.10.12 eth1 ospf Yes
Yes 10.0.0.2/32 192.168.20.12 eth2 ospf Yes
Yes 10.0.0.23/32 192.168.10.12 eth1 ospf Yes
Yes 10.0.1.1/32 192.168.10.12 eth1 ospf Yes

I also did a ping in Linux from 192.168.10.1 to 10.0.0.1 and it made more sense than the Windows ping that fails. I saw a redirect from the default router with the Hercules LCS adapter as the new next hop. That's how it should work. I don't know why I don't see that in Windows. More research needed here.

PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
From 192.168.10.100: icmp_seq=1 Redirect Host(New nexthop: 192.168.10.12)
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=1.31 ms
From 192.168.10.100: icmp_seq=2 Redirect Host(New nexthop: 192.168.10.12)
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.949 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.623 ms

I think that explains why a static route works and a dynamic route doesn't. The static route already knows the next hop is the Hercules adapter, the dynamic route needs a redirect.

Fish-Git commented 5 years ago

More research needed here.

Definitely.

As I said, it works for me. Why it doesn't work for you is unknown. There's something about your network's complex setup that's causing it to fail, but for a simple setup like mine it seems to work fine.

I'll continue to keep this issue open to give you a little time to try and figure out what's going on. If nothing definitive can be determined after a period of time I'll go ahead and close it then. In the mean time however I wish you luck. Please keep us informed as to your progress (i.e. continue to post your test results, etc).

rgschmi commented 5 years ago

I've simplified the environment by shutting down OSPF and adding a static route in my router to my cross-subnet VIPA, 10.0.0.2, with a next hop of 192.168.20.12 (my OSA)

I've got two different failure scenarios.

First, if I ping from the Hercules workstation (source IP 192.168.20.1) the destination mac for the ping is my router. I understand that, but what I don't understand is why I don't see a redirect from the router like I do in linux. So, I know WHAT happened, but I don't know WHY. None of this has to do with Hercules as far as I can see.

The second failure is when I ping from 192.168.30.1. In that case the destination mac for the ping is the the OSA, so my static route is working, but the ping still fails. I can ping a same-subnet VIPA (192.168.20.10) and that works. I'll try a TCPIP packet trace of this scenario to see if the ping makes it to the TCPIP stack.

mcisho commented 5 years ago

Can you please provide a diagram of your network? You keep mentioning IP addresses that have never been mentioned before, with no explanation of what or where they are. I'm completely confused!

rgschmi commented 5 years ago

Attached is a diagram of my network. Not shown is a Raspberry Pi running Hercules at address 192.168.1.95 that serves my shared disk drive: 8007 3390 192.168.1.95::8007 cu=3990-3

All VLINKs and XCF have a mask of /32. All others have a mask of /24

NUINET diagram.pdf

rgschmi commented 5 years ago

I did some more testing trying to ping 10.0.0.2 from 192.168.30.1. I ran Wireshark and TCPIP packet traces during the test. As mentioned above, I shut down OSPF and configured a static route to 10.0.0.2. I added a same-subnet VLINK, 192.168.20.10 to the Mike2 configuration for testing. I pinged VLINK 192.168.20.10 successfully, but pinging 10.0.0.2 times out.

The TCPIP packet trace shows the ping of 192.168.20.10, but the trace shows no packets arriving from the ping of 10.0.0.2. Wireshark shows the next hop MAC address of the ping of 10.0.0.2 as being the Hercules OSA address, as it should.

Is there any QETH or CTCI-WIN tracing that I can do that would help resolve this issue?

NUI RGS IPCS DUMP IPCS JOB00037 09 MAY 2019 16.09.32.pdf

mcisho commented 5 years ago

I shut down OSPF

Where? Globally? Or on specific host(s) and/or guest(s)?

and configured a static route to 10.0.0.2.

Where? On specific host(s) and/or guest(s)? Please be specific and detailed, routing is all in the detail, airy-fairy descriptions aren't helpful.

Presumably all of your host and guest systems are using OSPF for their routing? If they aren't can we please see the routing tables for the hosts and the guests.

Are you attempting to emulate a real-world mainframe environment?

rgschmi commented 5 years ago

OSPF was shut down in the MIKE2 Hercules guest only and the static route was added to the NUIrouter only for MIKE2. SUSE1, MIKE3, SUSE4 and NUIrouter are still running OSPF. None of my hosts (Windows and linux) are running OSPF. They only have default routes to NUIrouter. As I mentioned a Wireshark trace at MIKE2 shows that the ping requests from MIKE3 for both the same-subnet and cross-subnet packets have my OSA mac address as the destination, but only the same-subnet packets show up in an internal TCPIP packet trace. This was true for both the static route and OSPF environments. I only shut down OSPF to verify it wasn't the cause of the problem. SUSE1 and SUSE4 respond to pings of 10.0.0.1 and 10.0.0.4 from MIKE3 respectively. SUSE1 and SUSE4 only have default routes to NUIrouter, just like Windows does. SUSE1 and SUSE4 are configured with LCS interfaces, MIKE2 and MIKE3 are configured with OSA interfaces.

I am trying to replicate a 'generic' client configuration. Most of my clients have either a 'flat' single subnet host environment (two OSAs and a VIPA), or a configuration similar to the one in my diagram. My goal is to be able to replicate customer environments and 'pre-test' configuration changes and new functions, like Policy Agent or LUNS.

mcisho commented 5 years ago

Enter a Hercules 'qeth debug on all' command on MIKE2, then try pinging 10.0.0.2 from MIKE2 itself and one of the other machines. If you see the ping packets on the Hercules panel we'll know the problem lies in Hercules qeth code, if you don't we'll know the problem lies in Windows and/or/ CTCI-WIN.

rgschmi commented 5 years ago

We know the ping fails from MIKE2 because we are sending the ping to the default route (NUIrouter) and are not getting a redirect from the router to make the next hop the OSA. I don't believe Hercules is involved in that scenario.

I did ping 10.0.0.2 from MIKE3 with qeth debug on all and did not see the ping request. I did see the ping request and response when pinging 192.168.20.10.

I am having issues with Windows not writing to the log using >logfile. I'm not sure what is happening as other Windows commands write to the log and Hercules under linux works as well. The bottom line is that I don't have proof to send that shows the debug output. I have attached the Wireshark output that shows both pings have the destination mac address of my OSA.

What is puzzling is that I am running the same CTCI-win version as Fish, and he can ping 10.0.0.2. I believe he added a VIPA with that address to his config try to recreate my problem.

I'm not sure where in the data path Wireshark captures packets. Is it possible something in Windows is blocking the 10.0.0.2 packet before it gets to CTCI-win?

pingtrace.txt

Fish-Git commented 5 years ago

I did ping 10.0.0.2 from MIKE3 with qeth debug on all

He said to do it on MIKE2, not MIKE3. Doing it on MIKE doesn't help much as doing it on MIKE2 would.

I am having issues with Windows not writing to the log using >logfile.

On MIKE2, your Win7 box? Or on MIKE3, your Win10 box?

Regardless, you need to resolve this issue first before proceeding. It's clearly indicative of a fairly serious problem somewhere. A problem that might be causing your current ping issue. We won't of course know until we discover what the actual problem is and fix it.

Resolving this issue should be your top priority right now. Don't waste anymore time with your ping problem until you resolve this Windows logfile redirection problem first.

Out of curiosity, when was the last time you rebooted any of your systems? (especially your Windows systems?)

Another question: are either of your Windows systems running under VMware by perchance? (e.g. ESX or Workstation?) There are special VMware considerations to take into account if you are. They are documented in the CTCI-WIN Help file.

rgschmi commented 5 years ago

I figured out my logging problem. I had marked Hercules to run in Administrator mode on both Windows workstations, but started it in a non-administrator window. I opened an Administrator window and ran Hercules, and logging now works! Just in case that affected ping, I will ping one more time. Should know the results shortly.

Same result, no ping response

I rebooted two days ago when I added a static route to my router. Windows and the router were both rebooted to make a clean start.

I am not running any VMware.

I am attaching the Hercules log where I just tried to ping 10.0.0.2 and then 192.168.20.10. I have to ping from MIKE3 (or one of the SUSIEs), because pinging from MIKE2 goes to the default router and dies. I haven't yet figured out why redirect doesn't work.

I just added a static route to MIKE2 to route all 10.0.0.0/24 requests to the OSA IP address. Wireshark now shows the OSA as the ping destination address and still no ping response

herclog.txt

One difference I noticed is that in the 'real world' all IP addresses in the stack are downloaded to the OSAs OAT, If the OSA is not defined as a 'router' interface, packets not in the OAT are discarded. 10.0.0.2 is not registered according to the Hercules log. Does qeth have the same logic and is discarding 10.0.0.2?

mcisho commented 5 years ago

I just added a static route to MIKE2 to route all 10.0.0.0/24 requests to the OSA IP address. Wireshark now shows the OSA as the ping destination address and still no ping response

Where was this Wireshark run? On MIKE2 or one of the other machines? If it was run on MIKE2 does it show the ping packets for 10.0.0.2 arriving from the sending machine? We now know the packets aren't passed to MIKE2's Hercules, so are they even arriving at MIKE2, or are they being lost within MIKE2?

Please don't bother telling us what happens on the source machines, only tell us what happens on the destination machine, e.g, MIKE2, telling us what happens on the source machine is confusing and clouds the issue at the moment.

One difference I noticed is that in the 'real world' ...

Please stop thinking of Hercules qeth support in terms of what the 'real world' can do. Hercules' qeth support passes packets/frames between the network and the guest, and that's it. It does not support, except by accident, anything else that the 'real world' can do.

Fish-Git commented 5 years ago

... no ping response

You know, at this point I would really like to see your TCPIP PROFILE as I suspect it's wrong. In fact, I rather suspect 10.0.0.2 is nowhere to be found in your TCPIP PROFILE. Afterall, if it were, we should see a corresponding "HHC03805I 0:3001 QETH: tun0: Register guest IP address 10.0.0.2" message, but according to the Hercules log you attached, we're not seeing that:

21:29:55 HHC00901I 0:3001 OSA: Interface tun0, type TUN opened
21:29:55 HHC03997I 0:3001 OSA: tun0: using MAC address 02:00:5e:a3:be:84
21:29:55 HHC03997I 0:3001 OSA: tun0: using IP address 192.168.20.12
21:29:55 HHC03997I 0:3001 OSA: tun0: using MTU 1500
21:29:55 HHC03997I 0:3001 OSA: tun0: using drive MAC address 96:7a:59:e5:d2:bf
21:29:55 HHC03997I 0:3001 OSA: tun0: using drive IP address fe80::967a:59ff:fee5:d2bf
21:29:55 HHC03805I 0:3001 OSA: tun0: Register guest IP address 192.168.20.12
21:30:02 HHC03805I 0:3001 OSA: tun0: Register guest IP address 192.168.20.10

I rather suspect we've been pissing in the wind this whole time, with you suspecting the reason you are not receiving a response to your ping is due to a bug in CTCI-WIN and/or Hercules, whereas in reality the "bug" is actually in _you! (i.e. PEBKAC)_   ;-)

Please show us your TCPIP PROFILE. I suspect your missing your 10.0.0.2 VIPA definition altogether.

And for completeness, I'd also like to see your Hercules configuration file too. According to your Hercules logfile it needs adjusting. (*)


(*) Hint: you're using SDL Hyperion, not some other Hyperion. Some of the control file statements have changed and are now different from other Hyperions' statements.

rgschmi commented 5 years ago

If only it were that easy. Attached is my profile. Also the output from D TCPIP,,N,HOME output from MIKE2, and a ping of 10.0.0.2 from z/OS running on MIKE2:

CS V2R1: Pinging host 10.0.0.2      
Ping #1 response took 0.003 seconds.
 ===> D TCPIP,TCPIP,N,HOME             
  EZD0101I NETSTAT CS V2R1 TCPIP 994   
  HOME ADDRESS LIST:                   
  LINKNAME:   EZASAMEMVS               
    ADDRESS:  172.16.0.20              
      FLAGS:                           
 LINKNAME:   LOOPBACK             
   ADDRESS:  127.0.0.1            
     FLAGS:                       
INTFNAME:   VLINK1             <<<<
   ADDRESS:  10.0.0.2          <<<<
     FLAGS:  PRIMARY           <<<<
 INTFNAME:   VLINK10              
   ADDRESS:  192.168.20.10        
     FLAGS:                       
 INTFNAME:   LNK3000              
   ADDRESS:  192.168.20.12        
     FLAGS:                       
 INTFNAME:   LNK3004              
   ADDRESS:  192.168.20.13        
     FLAGS:                       
 INTFNAME:   LOOPBACK6            
   ADDRESS:  ::1                  
     TYPE:   LOOPBACK             
     FLAGS:                       
 7 OF 7 RECORDS DISPLAYED         
 END OF THE REPORT                
rgschmi commented 5 years ago

I just added a static route to MIKE2 to route all 10.0.0.0/24 requests to the OSA IP address. Wireshark now shows the OSA as the ping destination address and still no ping response

Where was this Wireshark run? On MIKE2 or one of the other machines? If it was run on MIKE2 does it show the ping packets for 10.0.0.2 arriving from the sending machine? We now know the packets aren't passed to MIKE2's Hercules, so are they even arriving at MIKE2, or are they being lost within MIKE2?

The pinging machine was MIKE2, trying to ping the VIPA in z/OS running on MIKE2, so the Wireshark trace was on MIKE2.

Please don't bother telling us what happens on the source machines, only tell us what happens on the destination machine, e.g, MIKE2, telling us what happens on the source machine is confusing and clouds the issue at the moment.

One difference I noticed is that in the 'real world' ...

Please stop thinking of Hercules qeth support in terms of what the 'real world' can do. Hercules' qeth support passes packets/frames between the network and the guest, and that's it. It does not support, except by accident, anything else that the 'real world' can do.

As an end user/tester, I don't know what you have or are trying to implement. I'm not a 'C' programmer, but in perusing qeth.c, I see routines for VMAC and VLANID, so I don't know if I should test them or not (I did). BTW, coding VLANID seems harmless, but coding VMAC causes an error and the OSA will not activate. I just wanted to make sue there wasn't an untested/unsupported routine that was discarding the 10.0.0.2 packets because they weren't in the OAT. Maybe I should code PRIROUTER on my OSA definition? I'll give it a try...

No joy on PRIROUTER

rgschmi commented 5 years ago

I will check all my config statements. You are correct that I switched over to SDL Hyperion without doing a thorough check of the statements.

rgschmi commented 5 years ago

Here are my Hercules configuration files, prior to updating for SDL.

Fish-Git commented 5 years ago

Attached is my profile.

Which contains 10.0.0.2 nowhere within it!

Fish-Git commented 5 years ago

Which contains 10.0.0.2 nowhere within it!

Okay, I'm going to presume that IPADDR 10.0.0.&WHO. resolves to 10.0.0.2, yes? If so, my bad. Sorry.

However, I still see a potential problem: you're defining two different VIPAs (i.e. DEFINE VIRTUAL), one called VLINK1 and the other called VLINK10, each with a different IPADDR assigned to it:

INTERFACE        VLINK1
     DEFINE      VIRTUAL
     IPADDR      10.0.0.&WHO.

INTERFACE        VLINK10
     DEFINE      VIRTUAL
     IPADDR      192.168.&ip..10

Note that it is the VLINK1 VIPA that has IPADDR 10.0.0.2 assigned to it.

Now take a look at how you have your OSA interfaces defined:

INTERFACE        LNK3000
   DEFINE        IPAQENET
   PORTNAME      OSA3000                ; MUST MATCH TRLE PORT NAME
   IPADDR        192.168.&IP..12/24     ; INTERFACE IP ADDRESS
   SOURCEVIPAINT VLINK10

INTERFACE        LNK3004
   DEFINE        IPAQENET
   PORTNAME      OSA3004                ; MUST MATCH TRLE PORT NAME
   IPADDR        192.168.&IP..13/24     ; INTERFACE IP ADDRESS
   SOURCEVIPAINT VLINK10

Now I don't know a lot about OSAs, but I'm guessing it is the SOURCEVIPAINT statement that determines which defined VIPA should be "bound"(?) to that particular OSA. Presuming that's correct, it appears the VLINK1 VIPA -- which is the one with IPADDR 10.0.0.2 assigned to it -- is not being used!

Both of your OSAs appear to be using VLINK10 as their VIPA.

Neither of them specify VLINK1 on their SOURCEVIPAINT statement, which is the VIPA with 10.0.0.2 assigned to it.

I'm guessing that means VIPA link VLINK1 is thus an "internal" VIPA link. It's a virtual interface that's defined, yes, but none of the defined OSA devices (interfaces) are using it. That might explain why we're not seeing any Hercules "HHC03805I ... Register guest IP address" for 10.0.0.2 anywhere in your Hercules log and why you can ping it just fine from z/OS but not from anywhere else.

(p.s. It also looks like you're only STARTing your LNK3000 OSA but not your LNK3004 OSA too. I don't know if that's significant or what you actually intended or not. I just thought I'd make note of it.)

rgschmi commented 5 years ago

Now it looks like this: Same results as before. I can ping the OSA, the VIPA in the OSA subnet, but not 10.0.0.2. You are correct, &who, is 2 for MIKE2. &IP. is 20 for MIKE2.

What SOURCEVIPA does is replace the interface IP address with the VIPA address as the source IP address on outbound packets. That's important for some applications that would get confused if we are sending packets to it from two different interfaces. (MULTIPATH PERPACKET on the IPCONFIG statement). I don't really want to replace a 192. source IP address with a 10. source IP address.

10.0.0.2 is still not registered according to the Hercules panel. I'm not starting LNK3004 to keep is simple. The configuration I had before should work. I have many customers that have reachable VIPAs that are not the target of SOURCEVIPAINT. I'll admit they are all using OSPF, which I turned off in the interest of trying to duplicate your environment, which works. I can easily turn it back on if you wish.

;                                                                     
INTERFACE        VLINK1                                               
     DEFINE      VIRTUAL                                              
     IPADDR      10.0.0.&WHO.                                         

INTERFACE        VLINK10                                              
     DEFINE      VIRTUAL                                              
     IPADDR      192.168.&ip..10                                      

INTERFACE        LNK3000                                              
   DEFINE        IPAQENET                                             
   PORTNAME      OSA3000                ; MUST MATCH TRLE PORT NAME   
   IPADDR        192.168.&IP..12/24     ; INTERFACE IP ADDRESS        
   PRIROUTER                                                          
   SOURCEVIPAINT VLINK1                                               

INTERFACE        LNK3004                                              
   DEFINE        IPAQENET                                             
   PORTNAME      OSA3004                ; MUST MATCH TRLE PORT NAME   
   IPADDR        192.168.&IP..13/24     ; INTERFACE IP ADDRESS        
   SOURCEVIPAINT VLINK10      
 TunTap64.dll version 3.7.0.5194 initiated
 0:3001 OSA: Interface tun0, type TUN opened
 0:3001 OSA: tun0: using MAC address 02:00:5e:a3:be:84
 0:3001 OSA: tun0: using IP address 192.168.20.12
 0:3001 OSA: tun0: using MTU 1500
 0:3001 OSA: tun0: using drive MAC address 96:7a:59:e5:d2:bf
 0:3001 OSA: tun0: using drive IP address fe80::967a:59ff:fee5:d2bf
 0:3001 OSA: tun0: Register guest IP address 192.168.20.12
 0:3001 OSA: tun0: Register guest IP address 192.168.20.10                                 
rgschmi commented 5 years ago

Not to belabor a point (maybe I am), I just defined three dynamic VIVPAs (192.168.20.14-192.168.20.16), obviously not targets of SOURCEVIPAINT statements, and they all respond to ping, though I don't see registration messages for them in the Hercules log.

^C C:\Users\HP>ping 192.168.20.14

Pinging 192.168.20.14 with 32 bytes of data: Reply from 192.168.20.14: bytes=32 time=1ms TTL=64 Reply from 192.168.20.14: bytes=32 time=1ms TTL=64

Ping statistics for 192.168.20.14: Packets: Sent = 2, Received = 2, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 1ms, Average = 1ms Control-C ^C C:\Users\HP>ping 192.168.20.15

Pinging 192.168.20.15 with 32 bytes of data: Reply from 192.168.20.15: bytes=32 time=1ms TTL=64 Reply from 192.168.20.15: bytes=32 time<1ms TTL=64

Ping statistics for 192.168.20.15: Packets: Sent = 2, Received = 2, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 1ms, Average = 0ms Control-C ^C C:\Users\HP>ping 192.168.20.16

Pinging 192.168.20.16 with 32 bytes of data: Reply from 192.168.20.16: bytes=32 time=1ms TTL=64 Reply from 192.168.20.16: bytes=32 time=1ms TTL=64

Ping statistics for 192.168.20.16: Packets: Sent = 2, Received = 2, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 1ms, Average = 1ms Control-C ^C

Also SUSE1 and SUSE4 (LCS interfaces) have a dynamic VIPA, 10.0.0.23 that moves between them with VIPA takeover and give back. That VIPA is also not the target of a SOURCEVIPA.

mcisho commented 5 years ago

Can we please forget this total irrelevance that is TCP/IP profiles and other machines, and stick to simple matters.

You provided a qeth trace that showed successful pings from 192.168.30.1 to 192.168.20.10, which I understand are MIKE3 Windows and a MIKE2 guest VIPA. The qeth trace did not show any pings for 10.0.0.2, the MIKE2 guest VIPA we're trying fix.

The pinging machine was MIKE2, trying to ping the VIPA in z/OS running on MIKE2, so the Wireshark trace was on MIKE2.

Can we see this MIKE2 Wireshark trace, please? If the ping for 10.0.0.2 was issued on MIKE2 then the problem lies entirely within the MIKE2 machine. Either:-

  1. Windows has no idea what to do with packets destined for 10.0.0.2, so discards them.
  2. WINPCAP has no idea what to do with packets destined for 10.0.0.2, so discards them.
  3. CTCI-WIN has no idea what to do with packets destined for 10.0.0.2, so discards them.

Whichever, this is NOT a Hercules or guest problem, the packets destined for 10.0.0.2 are not seen by Hercules or guest.

mcisho commented 5 years ago

As an end user/tester, I don't know what you have or are trying to implement. ...

If you think of any feature supported by a 'real world' OSA, then Hercules' qeth does NOT support it. The only feature supported by Hercules' qeth is passing IP packets or Ethernet frames between the Hercules guest and the Hercules hosts network. There may well be code that mentions VLAN et al, but that code only exists to convince the guest that it's dealing with a real enough thing that it can send or receive data.

Fish is currently attempting to get VIPAs to work, quite why is beyond me bearing in mind the current implementation of qeth, but that's just my humble opinion.

rgschmi commented 5 years ago

Can we please forget this total irrelevance that is TCP/IP profiles and other machines, and stick to simple matters.

You provided a qeth trace that showed successful pings from 192.168.30.1 to 192.168.20.10, which I understand are MIKE3 Windows and a MIKE2 guest VIPA. The qeth trace did not show any pings for 10.0.0.2, the MIKE2 guest VIPA we're trying fix.

The pinging machine was MIKE2, trying to ping the VIPA in z/OS running on MIKE2, so the Wireshark trace was on MIKE2.

Can we see this MIKE2 Wireshark trace, please? If the ping for 10.0.0.2 was issued on MIKE2 then the problem lies entirely within the MIKE2 machine. Either:-

1. Windows has no idea what to do with packets destined for 10.0.0.2, so discards them.

2. WINPCAP has no idea what to do with packets destined for 10.0.0.2, so discards them.

3. CTCI-WIN has no idea what to do with packets destined for 10.0.0.2, so discards them.

Whichever, this is NOT a Hercules or guest problem, the packets destined for 10.0.0.2 are not seen by Hercules or guest.

I have to agree. If this is working for Fish and not me, I'm wondering if CTCI-WIN is even seeing the packets. This is not a huge problem for me as long as it is working on linux. I don't want to disrupt more important Hercules development.

I've attached the ping from Mike2 to the VIPA and here is my route table. Sorry about the formatting.

C:\Users\HP>route print

Interface List 19...68 1c a2 12 b5 dc ......Realtek PCIe GBE Family Controller 1...........................Software Loopback Interface 1 21...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter 11...00 00 00 00 00 00 00 e0 Microsoft 6to4 Adapter 12...00 00 00 00 00 00 00 e0 Microsoft Teredo Tunneling Adapter

IPv4 Route Table

Active Routes: Network Destination Netmask Gateway Interface Metric 0.0.0.0 0.0.0.0 192.168.20.100 192.168.20.1 266 10.0.0.0 255.255.255.0 192.168.20.12 192.168.20.1 11 127.0.0.0 255.0.0.0 On-link 127.0.0.1 306 127.0.0.1 255.255.255.255 On-link 127.0.0.1 306 127.255.255.255 255.255.255.255 On-link 127.0.0.1 306 192.168.20.0 255.255.255.0 On-link 192.168.20.1 266 192.168.20.1 255.255.255.255 On-link 192.168.20.1 266 192.168.20.255 255.255.255.255 On-link 192.168.20.1 266 224.0.0.0 240.0.0.0 On-link 127.0.0.1 306 224.0.0.0 240.0.0.0 On-link 192.168.20.1 266 255.255.255.255 255.255.255.255 On-link 127.0.0.1 306 255.255.255.255 255.255.255.255 On-link 192.168.20.1 266

Persistent Routes: Network Address Netmask Gateway Address Metric 0.0.0.0 0.0.0.0 192.168.20.100 Default

mike2ping.txt

Fish-Git commented 5 years ago

... I'm wondering if CTCI-WIN is even seeing the packets.

Good question. I'll have to think about how to go about proving whether that's true or not. (I'm not sure if my tracing is that detailed. If it's not though, I can always enhance it so it is.)

But let's not get sidetracked....

Let's move on to more important things...

  According to your attached Wireshark of your ping attempt:  

No.     Time         Source                Destination           Protocol Length Info
      2 07:40:49.526 192.168.20.1          10.0.0.2              ICMP     74     Echo (ping) request  id=0x0001, seq=192/49152, ttl=128 (no response found!)

Frame 2: 74 bytes on wire (592 bits), 74 bytes captured (592 bits)
Ethernet II, Src: Rosewill_12:b5:dc (68:1c:a2:12:b5:dc), Dst: 02:00:5e:a8:14:0c (02:00:5e:a8:14:0c)
    Destination: 02:00:5e:a8:14:0c (02:00:5e:a8:14:0c)
    Source: Rosewill_12:b5:dc (68:1c:a2:12:b5:dc)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.20.1, Dst: 10.0.0.2
...etc...

  Notice the ping's destination MAC: 02:00:5e:a8:14:0c.

Now take a look at the Hercules logfile you attached earlier:

21:29:55 HHC04100I TunTap64.dll version 3.7.0.5194 initiated
21:29:55 HHC00901I 0:3001 OSA: Interface tun0, type TUN opened
21:29:55 HHC03997I 0:3001 OSA: tun0: using MAC address 02:00:5e:a3:be:84
21:29:55 HHC03997I 0:3001 OSA: tun0: using IP address 192.168.20.12
21:29:55 HHC03997I 0:3001 OSA: tun0: using MTU 1500
21:29:55 HHC03997I 0:3001 OSA: tun0: using drive MAC address 96:7a:59:e5:d2:bf
21:29:55 HHC03997I 0:3001 OSA: tun0: using drive IP address fe80::967a:59ff:fee5:d2bf
21:29:55 HHC03805I 0:3001 OSA: tun0: Register guest IP address 192.168.20.12
21:30:02 HHC03805I 0:3001 OSA: tun0: Register guest IP address 192.168.20.10

  Notice the MAC that Hercules (z/OS) is using for tun0: 02:00:5e:a3:be:84.

They're not the same:

        02:00:5e:a8:14:0c     (MAC the ping is being sent to = wrong?)       02:00:5e:a3:be:84     (destination MAC the ping should? be using instead?)  

I'm not sure what's going on but enabling tt32 (CTCI-WIN) tracing might help.

Do this:

Immediately after powering on Hercules on MIKE2 (but before IPLing your z/OS guest), enter the command:

tt32 debug

This should enable CTCI-WIN tracing on Hercules. I want to verify whether or not CTCI-WIN is using the same MAC address as what Hercules & z/OS are using, because based on the above evidence it looks like they're not, and if that's true, then that is the root of the problem.

After entering the tt32 debug command, then go ahead and IPL your z/OS guest and once it's up, go ahead and try your ping 10.0.0.2 from MIKE2 Windows, the same system where Herc/zOS is running that supposedly owns 10.0.0.2.

After the ping completes (fails), enter the Windows command:

arp -a

Then of course shutdown Hercules and attached the logfile, etc.

Thanks.

rgschmi commented 5 years ago

Sure looks like the wrong mac address! I'll verify my static route and run the tt32 debug procedure. I got nothing from the trace, so I will try again to make sure I did it right. However the ping and arp addresses match but do not match the tun0 mach addresses, just as you mentioned above. Stand by for another try.

rgschmi commented 5 years ago

I ran the test again. No trace entries that I can see:

I did another Wireshark too, and pinged both 10.0.0.2 (which didn't work) as well as 192.168.20.10 (which did) and it shows the same destination MAC for both (which does not match tun0 but which does match Windows's arp table):

C:\Windows\system32>arp -a

Interface: 192.168.20.1 --- 0x13
  Internet Address      Physical Address      Type
  192.168.20.10         02-00-5e-a8-14-0c     dynamic
  192.168.20.12         02-00-5e-a8-14-0c     dynamic
  192.168.20.100        80-2a-a8-de-84-bc     dynamic
  192.168.20.255        ff-ff-ff-ff-ff-ff     static
  224.0.0.252           01-00-5e-00-00-fc     static
  239.255.255.250       01-00-5e-7f-ff-fa     static
mcisho commented 5 years ago

The z/OS qeth interface is layer 3, hence CTCI-WIN is not told that the MAC that Hercules (z/OS) is using for tun0 is 02-00-5e-a3-be-84. A TUNTAP_SetMACAddr call is only made by qeth for layer 2 interfaces. Presumably, CTCI-WIN itself is calculating the MAC address of 02-00-5e-a8-14-0c, based on the interfaces IP address of 192.168.20.12. Does CTCI-WIN route layer 3 packets based an MAC addresses?

Fish-Git commented 5 years ago

The z/OS qeth interface is layer 3, hence CTCI-WIN is not told that the MAC that Hercules (z/OS) is using for tun0 is 02-00-5e-a3-be-84. A TUNTAP_SetMACAddr call is only made by qeth for layer 2 interfaces.

(Doh!) Yes, of course. I had completely forgotten about that. Thank you for pointing that out, Ian.

So the "wrong MAC address" is not really the problem after all. My apologies for the wild goose chase.

Fish-Git commented 5 years ago

Presumably, CTCI-WIN itself is calculating the MAC address of 02-00-5e-a8-14-0c, based on the interfaces IP address of 192.168.20.12.

Correct. That's the default that it assigns whenever an interface is created. But it of course can be changed (overridden) by simply setting it to some other value (which, as you pointed out, Hercules does do for layer 2 interfaces).

Does CTCI-WIN route layer 3 packets based an MAC addresses?

Yes.

Regardless of whether the interface is layer 2 or layer 3 (tun or tap), when a packet is received (on the host's adapter), CTCI-WIN only looks at the destination MAC address to determine whether the packet is for one of its virtual interfaces or not.

On the sending side of things however, because CTCI-WIN 'tun' (layer 3) interfaces only receive IP packets from Hercules, it needs to perform some type of rudimentary "routing" (and it relies heavily on Windows for that) to determine what MAC to place into the Ethernet header it has to add to the packet before it can send it. That's the only type of "routing" that CTCI-WIN does. Everything else is done via MAC address.

Fish-Git commented 5 years ago

I did another Wireshark too, and pinged both 10.0.0.2 (which didn't work) as well as 192.168.20.10 (which did) and it shows the same destination MAC for both (which does not match tun0 but which does match Windows's arp table)

Yes, that would be expected. As Ian pointed out, Hercules does not issue a set-mac-addr to CTCI-WIN for layer 3 (tun) interfaces. Thus, if Hercules and/or z/OS chooses to use a different MAC than what CTCI-WIN initialized its virtual interface with, it won't know about it. CTCI-WIN will continue to use its own MAC for tun (layer 3) interfaces.

But this is okay and things should work just fine regardless of that since tun interfaces are layer 3, not layer 2 interfaces. Hercules and z/OS will only ever "see" layer 3 packets (IP packets) anyway. That is to say, Hercules will never "see" the Ethernet header (with the MAC address) for tun interfaces. Because it's a tun interface, CTCI-WIN strips off the Ethernet header and delivers only IP packets to Hercules. (And vice-versa too: Hercules only sends IP packets -- without any Ethernet headers -- to CTCI-WIN for tun interfaces.)

My apologies for the wild goose chase. The apparent different MAC addresses being used it NOT the problem. The problem lies elsewhere. (Where, I have no fricking clue.)

Fish-Git commented 5 years ago

I hate to keep coming back to this, but...

The problem is not in CTCI-WIN (nor technically with Hercules either). The problem is in z/OS. The problem is that, for whatever reason, z/OS is not sending a:

case IPA_CMD_SETIP:  /* 0xB1 : Set Layer-3 IP unicast address */

command packet to Hercules for the 10.0.0.2 VIPA address. We know this to be true because we don't see any "Register guest IP address" message from Hercules for 10.0.0.2. We only see one for the primary IP address (192.168.20.12) as well as the second VIPA (192.168.20.10), but there's never any "register" message for 10.0.0.2 (which is always issued by Hercules whenever it receives a IPA_CMD_SETIP command packet from z/OS):

Thus, the problem is with z/OS. Q.E.D.

Now the question becomes WHY isn't z/OS sending the SETIP command? My own guess is because your TCPIP PROFILE contains defines for two VIPAs, and the second one is overriding the first one (i.e. the first one is being ignored and only the second one is being processed (honored) by z/OS).

Try removing the second VIPA definition so that only ONE VIPA is defined: the one for 10.0.0.2.

And don't forget to also make sure that's the one specified on your SOURCEVIPAINT statement too:

(i.e. this is the way I believe your TCPIP PROFILE should look):

; Hardware definitions:
;
INTERFACE        VIPA10
     DEFINE      VIRTUAL
     IPADDR      10.0.0.&WHO.

;INTERFACE        VIPA192
;     DEFINE      VIRTUAL
;     IPADDR      192.168.&ip..10

INTERFACE        LNK3000
   DEFINE        IPAQENET
   PORTNAME      OSA3000                ; MUST MATCH TRLE PORT NAME
   IPADDR        192.168.&IP..12/24     ; INTERFACE IP ADDRESS
   SOURCEVIPAINT VIPA10                 ; 10.0.0.2

;INTERFACE        LNK3004
;   DEFINE        IPAQENET
;   PORTNAME      OSA3004                ; MUST MATCH TRLE PORT NAME
;   IPADDR        192.168.&IP..13/24     ; INTERFACE IP ADDRESS
;   SOURCEVIPAINT VIPA192                ; 192.168.20.10

Try that.

Fish-Git commented 5 years ago

I ran the test again. No trace entries that I can see:

FYI: my apologies for incorrect instruction. I forgot that you also need to enable debug on the QETH device too:

qeth debug on all
tt32 debug

(i.e. both commands are needed)

My bad. :(

But don't worry about it. As I've already said (thanks to Ian), the apparent difference in the MAC addresses is not the problem. The problem lies somewhere within z/OS. We need to somehow get z/OS to issue the SETIP for the 10.0.0.2 VIPA (which it currently is not doing). If we can figure out how to get z/OS to do that, then I'm 100% certain pings to 10.0.0.2 will start working.

Fish-Git commented 5 years ago

Try removing the second VIPA definition so that only ONE VIPA is defined: the one for 10.0.0.2.

For what it's worth, I have personally confirmed that, apparently, only the last VIPA defined is "honored" by z/OS. That is to say, when more than one VIPA interface (DEFINE VIRTUAL) is defined, only the last one seems to work.

I did my test by defining two VIPAs, each with a different IP address, and only the second one (i.e. the last one defined) only ever causes the "Register guest IP address" message to be issued by Hercules. indicating only the second (last) defined VIPA is the only one that z/OS issues the SETIP command for.

And it is the "Register guest IP address" that controls which IP address(es) that CTCI-WIN will respond to. (It sends out a Gratuitous ARP for each IP address assigned to a given tun interface, and it is only those IP addresses that it responds to ARP requests for too.)

So the problem is apparently exactly what I said: your TCPIP PROFILE is wrong. This is a user error, not a Hercules bug!

Now I fully understand that perhaps the way you are doing things is the way "real OSAs" are supposed to work, but as Ian pointed out several comments earlier, Hercules's QETH (OSA) emulation is known to not behave the same way that "real" OSAs behave. There are a lot of things that I'm sure real OSAs can do that Hercules's emulation currently does not support and may possibly never support.

You need to keep in mind, Bob, that all of this OSA code in Hercules was cobbled together over a long period of time via lots of trial and error effort (examining packet traces, etc) and by examining Linux's OSA support. How real OSAs behave is not documented anywhere. IBM does not document how OSAs behave internally, so what we have is only what we could figure out. You should consider yourself damn lucky that Hercules's OSA support works as well as it currently does!

In conclusion, it appears your only choice (at the present time) is to define one and only one virtual interface (i.e. VIPA) in your TCPIP PROFILE, since apparently that is the only one that z/OS "registers".

(And it is the "registering" of the IP addresses that controls how both Hercules and CTCI-WIN behaves.)

I am going to close this issue at this time as I believe we have taken it as far as it can be taken. The problem is not with Hercules and the problem is not with CTCI-WIN. The problem is a "user error" insomuch as your TCPIP PROFILE isn't setup properly to be compatible with Hercules.

You can re-open this issue again at some point in the future if new information comes to light, but for now, I believe this issue is CLOSED.