PDP-10 / klh10

Community maintained version of Kenneth L. Harrenstien's PDP-10 emulator.
Other
60 stars 8 forks source link

Networking no longer works #14

Closed albiorix closed 7 years ago

albiorix commented 7 years ago

With the latest git changes to klh10 my previously working TCP/IP setup no longer works.

System - Linux Fedora 24 64 bit

Software - Latest git klh10 running TOPS-20 Panda distribution.

Startup Script (klt20) -

[neilt@wol panda]$ cat klt20
#!/bin/sh
export KLH10_NET_BRIDGE=enp3s5homenet
exec ./kn10-kl klt20.ini

ini file (klt20.ini) -

[neilt@wol panda]$ cat klt20.ini
; Sample KLH10.INI for initial installation

; Define basic device config - one DTE, one disk, one tape.
; Use two RH20s because TOPS-10 doesn't like mixing disk and tape on
; the same controller (TOPS-20 is fine).

devdef dte0 200   dte   master
devdef rh0  540   rh20
devdef rh1  544   rh20
devdef dsk0 rh0.0 rp    type=rp07 format=dbd9
devdef mta0 rh1.0 tm03  type=tu45

; Need KLNI to avoid LAPRBF BUGCHKs - use valid address if known
;
; devdef ni0 564 ni20 ipaddr=10.0.0.51
; The (NetBSD/FreeBSD/Linux) version with tap(4) and bridge(4) creates the
; named tap device dynamically and bridges it to the default interface.
; If you want it differently (for instance routed instead of bridged),
; you can create the tap yourself and it will be used as it is.
devdef ni0 564 ni20 ipaddr=192.168.0.51 ifmeth=tap+bridge ifc=tap0 dedic=true
; Use ifmeth=tap if you handle the bridging or routing yourself.

; Use this version if you want to use libpcap for ethernet access.
;devdef ni0 564 ni20 ipaddr=10.0.0.51 ifmeth=pcap ifc=re0 dedic=false

; Load disk bootstrap directly
load boot.sav

; Now ready to GO

System Startup Messages -

[BOOT: Loading] [OK]

[TOPS20 mounted]
Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4 Internet: Loading host names [OK]

System restarting, wait...
Date and time is: Sunday, 22-January-2017 7:24AM
Why reload? pm
Run CHECKD? n
 DDMP: Started
[KNILDR: Loading microcode version 1(172) into Ethernet channel 0]

[dpni20: Cannot find default IP interface for host]
[dpni20: linux bridge_create: ioctl res=0]
[dpni20: Attached "tap0" to bridge "enp3s5homenet"]
[dpni20: ifc "tap0" => ether 0:0:0:0:0:0]
[dpni20:   VHOST 192.168.0.51]
[dpni20: Enabled net.ipv4.conf.tap0.arp_accept]

SYSJOB 7A(88)-4 started at 22-Jan-2017 0724
SJ  0: @LOGIN OPERATOR
SJ  0: @ENABLE
SJ  0: $SYSTEM:STSJ1
22-Jan-2017 07:24:37 SYSJB1: SYSJB1 started.
SJ  0: $^ESET LOGIN ANY
SJ  0: $OPR

 [NCP]:              Waiting for ORION to start
22-Jan-2017 07:24:38 SYSJB1: Job 0: 
22-Jan-2017 07:24:38 SYSJB1: Job 0:  Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4
22-Jan-2017 07:24:38 SYSJB1: Job 1: 
22-Jan-2017 07:24:38 SYSJB1: Job 1:  Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4
22-Jan-2017 07:24:38 SYSJB1: Job 2: 
22-Jan-2017 07:24:38 SYSJB1: Job 2:  Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4
22-Jan-2017 07:24:38 SYSJB1: Job 3: 
22-Jan-2017 07:24:38 SYSJB1: Job 3:  Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4
22-Jan-2017 07:24:38 SYSJB1: Job 4: 
22-Jan-2017 07:24:38 SYSJB1: Job 4:  Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4
22-Jan-2017 07:24:38 SYSJB1: Job 5: 
22-Jan-2017 07:24:38 SYSJB1: Job 5:  Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4
22-Jan-2017 07:24:38 SYSJB1: Job 1: @LOGIN OPERATOR
22-Jan-2017 07:24:38 SYSJB1: Job 3: @LOGIN OPERATOR
22-Jan-2017 07:24:38 SYSJB1: Job 0: @LOGIN OPERATOR
22-Jan-2017 07:24:38 SYSJB1: Job 2: @LOGIN OPERATOR
22-Jan-2017 07:24:38 SYSJB1: Job 2: @ENABLE
22-Jan-2017 07:24:38 SYSJB1: Job 3: @ENABLE
22-Jan-2017 07:24:38 SYSJB1: Job 4: @LOGIN OPERATOR
22-Jan-2017 07:24:38 SYSJB1: Job 5: @LOGIN OPERATOR
22-Jan-2017 07:24:38 SYSJB1: Job 0: @ENABLE
22-Jan-2017 07:24:38 SYSJB1: Job 0: $RESOLV
22-Jan-2017 07:24:38 SYSJB1: Job 1: @ENABLE
22-Jan-2017 07:24:38 SYSJB1: Job 2: $SMTJFN
22-Jan-2017 07:24:38 SYSJB1: Job 3: $MMAILR
22-Jan-2017 07:24:38 SYSJB1: Job 4: @ENABLE
22-Jan-2017 07:24:38 SYSJB1: Job 4: $IMAPSV
22-Jan-2017 07:24:38 SYSJB1: Job 5: @ENABLE
22-Jan-2017 07:24:38 SYSJB1: Job 1: $NETSRV
22-Jan-2017 07:24:38 SYSJB1: Job 5: $FTS
22-Jan-2017 07:24:38 SYSJB1: Job 5: FTS>TAKE FTS.CMD
22-Jan-2017 07:24:38 SYSJB1: Job 5: [FTS20: FTS event 38: spooler started]
22-Jan-2017 07:24:36 Internet: Network My-Network on, Output on
SJ  0: OPR>TAKE SYSTEM:SYSTEM.CMD
SJ  0: 
SJ  0: 07:24:40        --ORION logging disabled by job 1 OPERATOR at terminal 13--
SJ  0: 
SJ  0: 07:24:40        --Output display for OPR modified--
SJ  0: 
SJ  0: 07:24:40        --Output display for OPR modified--
SJ  0: 
SJ  0: 07:24:40        --Output display for OPR modified--
SJ  0: 
SJ  0: 07:24:40        --Output display for OPR modified--
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 0  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 1  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 2  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 3  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 0  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 1  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 2  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 3  -- Set Accepted --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 0  -- Startup Scheduled --
SJ  0: 
SJ  0
07:24:40 From operator terminal 13 on node TOPS20::
    =>System in operation
: 07:24:40        Batch-Stream 1  -- Startup Scheduled --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 2  -- Startup Scheduled --
SJ  0: 
SJ  0: 07:24:40        Batch-Stream 3  -- Startup Scheduled --
SJ  0: OPR>
SJ  0: 07:24:40        --SEND command completed--

% [Logger 22-Jan-2017 07:24:41 ]: Started at 22-Jan-2017 07:24:37
SJ  0: OPR>
SJ  0: 07:24:46          -- Structure Status Change Detected --
SJ  0:                 Previously mounted structure TOPS20: detected
SJ  0: 
SJ  0: 07:24:46          -- Structure Status Change Detected --
SJ  0:                 Structure state for structure TOPS20 is incorrect
SJ  0:                   EXCLUSIVE/SHARED attribute set incorrectly
SJ  0:                 Status of structure TOPS20: is set:
SJ  0:                 Domestic, Unregulated, Shared, Available, Dumpable
SJ  0: 

bridge interface is up -

[root@wol ~]# ip addr show dev enp3s5homenet
7: enp3s5homenet: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:15:17:a6:0e:73 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.5/24 brd 192.168.0.255 scope global enp3s5homenet
       valid_lft forever preferred_lft forever
    inet6 fde3:e5a0:6e8e:0:215:17ff:fea6:e73/64 scope global mngtmpaddr dynamic 
       valid_lft forever preferred_lft forever
    inet6 fe80::215:17ff:fea6:e73/64 scope link 
       valid_lft forever preferred_lft forever

tap0 is up and connected to the bridge -

[root@wol ~]# brctl show
bridge name bridge id       STP enabled interfaces
docker0     8000.0242d6e8cae6   no      
enp3s5homenet       8000.001517a60e73   yes     enp2s0f1
                            tap0
virbr0      8000.000000000000   yes     

But no IP address on tap0 -

[root@wol ~]# ip addr show dev tap0
16: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp3s5homenet state UNKNOWN group default qlen 1000
    link/ether 9a:40:a2:c5:fb:af brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9840:a2ff:fec5:fbaf/64 scope link 
       valid_lft forever preferred_lft forever

If I manually add an IP address -

[root@wol ~]# ip addr add 192.168.0.51/24 dev tap0
[root@wol ~]# ip addr show dev tap0
16: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp3s5homenet state UNKNOWN group default qlen 1000
    link/ether 9a:40:a2:c5:fb:af brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.51/24 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fe80::9840:a2ff:fec5:fbaf/64 scope link 
       valid_lft forever preferred_lft forever

@Rhialto I can ping it, but not telnet to my Panda -

[root@wol ~]# ping -c 5 192.168.0.51
PING 192.168.0.51 (192.168.0.51) 56(84) bytes of data.
64 bytes from 192.168.0.51: icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from 192.168.0.51: icmp_seq=2 ttl=64 time=0.048 ms
64 bytes from 192.168.0.51: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 192.168.0.51: icmp_seq=4 ttl=64 time=0.060 ms
64 bytes from 192.168.0.51: icmp_seq=5 ttl=64 time=0.049 ms

--- 192.168.0.51 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4103ms
rtt min/avg/max/mdev = 0.048/0.052/0.060/0.008 ms
[root@wol ~]# telnet 192.168.0.51
Trying 192.168.0.51...
telnet: connect to address 192.168.0.51: Connection refused

Entire network broken - At this stage, the networking on my machine is broken, I can ping my own interface (192.168.0.5) and the tap0 interface (192.168.0.51) but nothing's going into or out of my machine. Once I shutdown klh10, everything's back to normal.

Closedown works On the plus side - dpni20 no longer hangs around, and everything shuts down properly.

If you need any more info/debugging, please shout.

Rhialto commented 7 years ago

Thanks for the lot of detail in your report.

When I first read it, I thought some new change I made in my local copy might be the issue, but I didn't check it in yet, so that was a red herring.

But it did make me wonder -- you mention that you don't get an address on the tap interface, suggesting to me that this is a change. There is long-standing code to actually remove an address on taps (osdnet.c line 1913; for non-Linux systems though), so can you confirm that you did get one before? (I am considering to leave addresses on existing taps alone, since the scenario for setting up a routed tap involves pre-creating a tap with an address which matches the virtual system's default route).

Given the recent 2 commits, can you check which one causes the problem? For both I can argue that they should not have an effect on networking, at least not before termination. The first checkin (29452634326f4d75d6dac14e28ba250dfbaf7ed2) ought to only add some front-end commands that might help with debugging. But I could have broken command parsing somehow, maybe. The second one (09ee2e73c8cc30a9c4e86455fbde49ea0064faf6) should fix the termination of dpni20 and have no effect before that.

hrlzm commented 7 years ago

The following is kind of a giveaway:

[root@wol ~]# telnet 192.168.0.51 Trying 192.168.0.51... telnet: connect to address 192.168.0.51: Connection refused

This (connection refused) means that someone (your host machine) is receiving the connection attempt, and refusing it. If you turn on the telnet server on your host you can verify this. Or you can try telnet to another port if you have something else enabled on the host system. Ping works, naturally, since the host handles it no matter what. Setting the Tops-20 system's IP address on the tap interface is plain wrong. Something else is the cause of the problem.

--Johnny

albiorix commented 7 years ago

OK, it gets a little more interesting - I have now configured 2 systems -

panda        current git           192.168.0.61 tap0    networking broken
pandatest    previous release      192.168.0.62 tap1    networking works

I have them both up and running at the same time.

panda - 192.168.0.61 tap0

[root@wol ~]# ip addr show dev tap0
24: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp3s5homenet state UNKNOWN group default qlen 1000
    link/ether 46:ae:d0:c8:dd:de brd ff:ff:ff:ff:ff:ff
    inet6 fe80::44ae:d0ff:fec8:ddde/64 scope link 
       valid_lft forever preferred_lft forever

pandatest - 192.168.0.62 tap1

[root@wol ~]# ip addr show dev tap1
25: tap1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp3s5homenet state UNKNOWN group default qlen 1000
    link/ether 4a:6f:0c:96:25:76 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::486f:cff:fe96:2576/64 scope link 
       valid_lft forever preferred_lft forever

No IP address for either right? But -

root@wol ~]# ping -c 5 192.168.0.61
PING 192.168.0.61 (192.168.0.61) 56(84) bytes of data.
From 192.168.0.5 icmp_seq=1 Destination Host Unreachable
From 192.168.0.5 icmp_seq=2 Destination Host Unreachable
From 192.168.0.5 icmp_seq=3 Destination Host Unreachable
From 192.168.0.5 icmp_seq=4 Destination Host Unreachable
From 192.168.0.5 icmp_seq=5 Destination Host Unreachable

--- 192.168.0.61 ping statistics ---
5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4104ms
pipe 4
[root@wol ~]# ping -c 5 192.168.0.62
PING 192.168.0.62 (192.168.0.62) 56(84) bytes of data.
64 bytes from 192.168.0.62: icmp_seq=1 ttl=255 time=0.278 ms
64 bytes from 192.168.0.62: icmp_seq=2 ttl=255 time=0.250 ms
64 bytes from 192.168.0.62: icmp_seq=3 ttl=255 time=0.277 ms
64 bytes from 192.168.0.62: icmp_seq=4 ttl=255 time=0.223 ms
64 bytes from 192.168.0.62: icmp_seq=5 ttl=255 time=0.247 ms

--- 192.168.0.62 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4128ms
rtt min/avg/max/mdev = 0.223/0.255/0.278/0.020 ms

And

[neilt@wol ~]$ telnet 192.168.0.61
Trying 192.168.0.61...
telnet: connect to address 192.168.0.61: No route to host
[neilt@wol ~]$ telnet 192.168.0.62
Trying 192.168.0.62...
Connected to 192.168.0.62.
Escape character is '^]'.

 Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4

This system is for the use of authorized users only.  Usage of
this system may be monitored and recorded by system personnel.

Anyone using this system expressly consents to such monitoring
and is advised that if such monitoring reveals possible
evidence of criminal activity, system personnel may provide the
evidence from such monitoring to law enforcement officials.

@

I'm going to back the two changes out and see which one does it (it might take me a bit of time as my git-fu is not strong), but it's quite weird that neither of the tap interfaces show IP addresses.

Also the startup is different - working -

[dpni20: ifc "tap1" => ether f2:b:a4:84:ab:a8]

not working

[dpni20: ifc "tap0" => ether 0:0:0:0:0:0]

i.e no MAC address

Back soon

Rhialto commented 7 years ago

Interesting. I think the hint is probably the [dpni20: ifc "tap0" => ether 0:0:0:0:0:0] but I'll have to ponder and look closer to see how that could have happened. That the taps have no address is not surprising; if they are bridged they don't need one.

Rhialto commented 7 years ago

Code to set the mac address is in osdnet.c. Initial value:

static struct eth_addr emhost_ea =      /* Emulated host ether addr for tap */
    { 0xf2, 0x0b, 0xa4, 0xff, 0xff, 0xff };

and in the function osn_pfeaget just below it, the rightmost 3 bytes are randomized. If the ethernet address ends up as all 0, this proably doesn't get called for some reason.

albiorix commented 7 years ago

@Rhialto OK... there was a screw up on this side somewhere. My extreme apologies.

I reverted your two changes. No effect.

I checked the code. Did a git blame - nothing apparent.

So I blew away everything I had, re-cloned from github and suddenly everything was working. I have absolutely no idea what caused it because the only code I've been playing with is read20.c.

I now have the following -

[neilt@wol ~]$ telnet panda
Trying 192.168.0.61...
Connected to panda.
Escape character is '^]'.

 Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4

This system is for the use of authorized users only.  Usage of
this system may be monitored and recorded by system personnel.

Anyone using this system expressly consents to such monitoring
and is advised that if such monitoring reveals possible
evidence of criminal activity, system personnel may provide the
evidence from such monitoring to law enforcement officials.

@log nrt 
 Job 9 on TTY44  (TCP) 22-Jan-2017 12:28:12
  Last interactive login 22-Jan-2017 12:15:13
  Last non-interactive login Never
 End of LOGIN.CMD.4

[neilt@wol ~]$ telnet pandatest
Trying 192.168.0.62...
Connected to pandatest.
Escape character is '^]'.

 Neil's Panda Distribution, PANDA TOPS-20 Monitor 7.1(21733)-4

This system is for the use of authorized users only.  Usage of
this system may be monitored and recorded by system personnel.

Anyone using this system expressly consents to such monitoring
and is advised that if such monitoring reveals possible
evidence of criminal activity, system personnel may provide the
evidence from such monitoring to law enforcement officials.

@log nrt 
 Job 9 on TTY44  (TCP) 22-Jan-2017 16:28:21
  Last interactive login 22-Jan-2017 16:09:10
  Last non-interactive login 22-Jan-2017 00:09:05
 End of LOGIN.CMD.6

Once again, apologies.

Rhialto commented 7 years ago

No problem! Glad it is solved.

albiorix commented 7 years ago

@Rhialto @hrlzm Thanks for the assistance.