PDP-10 / klh10

Community maintained version of Kenneth L. Harrenstien's PDP-10 emulator.
Other
60 stars 8 forks source link

Chaosnet fails if any chip fails #18

Closed larsbrinkhoff closed 5 years ago

larsbrinkhoff commented 7 years ago

If you configure ch11 with a number of Chaos-IP mappings, and one of those mappings fails because e.g. looking up a hostname doesn't work, the whole ch11 will stop working.

KLH10# devdef chaos ub3  ch11  addr=764140 br=6 vec=270 myaddr=3150 chip=3143/up.foo.se chip=7100/sj.bar.net
CH11 param "chip": bad value syntax: "7100/sj.bar.net"
Device init failed

The result is that there is BUGHLT when ITS boots:

CHAOSNET INTERFACE NOT RESPONDING (CHECK THE BREAKER ON THE UNIBUS)  
BUGHALT.  FIND A WIZARD OR CONSIDER TAKING A CRASH DUMP.
THE SYSTEM HAS CRASHED AND CANNOT BE REVIVED WITHOUT EXPERT ATTENTION.
IF YOU CAN'T FIND HELP, RELOAD THE SYSTEM.
YOU ARE NOW IN DDT.
BUGPC/   CAI QSETUP+14   $Q-2/   CAIA 0   

I propose that it would be better to have ch11 complain loudly and then just ignore the failed mapping.

CC @bictorv

bictorv commented 7 years ago

I propose it's better the way it is, since it makes sure you notice the error in the configuration. Otherwise you probably won't?

larsbrinkhoff commented 7 years ago

The reson I'm mulling over this, is that I added all ITS hosts to the chip table checked into the ITS git repository. As it happens, one of them doesn't yet resolve to an IP number. No problem, I can just remove it.

However, while the other hosts work fine now, they may not work in the future.

Or there may just be a temporary lookup failure for one of the hosts.

larsbrinkhoff commented 7 years ago

A slightly different solution could be to NOT resolve hostnames at all during KLH10 startup. Instead, they could be looked up lazily and dynamically during runtime, with some caching of course.

This way, hostnames with changing IP numbers would be handled more gracefully. And avoids the BUGHLT too.

bictorv commented 7 years ago

That would be an improvement! To begin, avoid the caching in klh10 and let the resolver library do it; if that turns out too slow (even for ITS ;-)) do some manual thing.

bictorv commented 7 years ago

But you still want some warning that a chip definition doesn't have an IP address at boot, to help find spelling errors.

b4 commented 7 years ago

Erm, Oops. I have fixed my forgetting to add the record, too.

b4 commented 7 years ago

BTW what port(s) are used for this? I need to fully configure the firewall.

EDIT: Ah. there's a document for it. Cool.

larsbrinkhoff commented 7 years ago

It was a good thing, it uncovered a problem.

The default port is 42042, but anything works really. What's SIXBIT /ITS/ or /CHAOS/? I know how to output SIXBIT values in DDT, but not input.

b4 commented 7 years ago

It's early in the morning and I don't have DDT handy so per (http://rabbit.eng.miami.edu/info/decchars.html) it should be 233635

csmelosky@kale ~> grep -i "233635" /etc/services\n csmelosky@kale ~>

So we should be good (and yes, I must be careful of case-sensitive numbers when grepping! ;) )

Well, aside from the fact we don't have enough bits in TCP/IP for that port...:)

b4 commented 7 years ago

Oops - I did RADIX-50, didn't I.

We'd need to convert it to 16-bit RADIX-50 to fit.

larsbrinkhoff commented 7 years ago

$0' its$=516463
$0' chaos$=4350415763

SIXBIT /CH/ would work. But never mind, that was just a detour. We have a good default, let's use it!

larsbrinkhoff commented 7 years ago

This moves all resolution logic to dpchudp.c. The hostname is just passed as-is from dvch11.c.

Dynamically added IP addresses are handled by setting the hostname to a dotted quad.

larsbrinkhoff commented 7 years ago

I did some light testing. The check at start works.

Runtime dynamic lookup seems to work. But it's hard to say for sure when the only other Chaosnet node is down!

b4 commented 7 years ago

If I have a free moment at work tomorrow...I can bring SJ up. ;)

(by saying I'll do it tomorrow it means I will probably do it in half an hour)

b4 commented 7 years ago

madison-sj(config)#ip nat inside source static udp 10.12.2.5 42042 interface gi0/1 42042

try now. I should be operational.

larsbrinkhoff commented 7 years ago

Cool, thanks!

*:up sj
$$4^K EMACS$J
*
Message from LARS UP
SJ.GEWT.NET is up.
bictorv commented 7 years ago

Did you do some performance test with/out the "dynamic lookup" patch? Try transferring a large file between two hosts with, and two hosts without, the dynamic patch?

larsbrinkhoff commented 7 years ago

Good idea. No I haven't, yet.

The Chaos FTP server doesn't work in the GitHub build, so I'll have to fix that first.

bictorv commented 7 years ago

You don't need CFTP? Just enable MLDEV?

larsbrinkhoff commented 7 years ago

CHAOS MLDEV isn't so stellar either. It's probably a general problem with all Chaos services.

However, now that UP is back up, I can try NO MLDEV -> UP MLSLV.

larsbrinkhoff commented 7 years ago

Testing CFTP with dynamic CHIP enabled.

Sometimes I get "69 KBaud" according to CFTP, and sometimes 300-340.

Every five seconds or so, i get a "DB ITS revived!" system message. I believe that happens when the system detects a jump in time, e.g. it thinks the processor has been halted for a while.

There's a clear connection between the revived messages and the slower transfer rates.

larsbrinkhoff commented 7 years ago

Maybe relying on ITS' sense of real time isn't good, and I should use a stopwatch instead.

b4 commented 7 years ago

How about to SJ? I'm aware there's an undersea cable involved, but still...

(I wonder if it's taking a pacific route, or a transcontinental after transatlantic route...)

larsbrinkhoff commented 7 years ago

@b4 Making Chaos connections to a DB-derived ITS doesn't work yet.

larsbrinkhoff commented 7 years ago

Got plenty of "ITS revived!" messages over the night. I'm not sure this is a success.

larsbrinkhoff commented 7 years ago

I see a single "CHIP sj.gewt.net => error Host lookup failure" on the console. But it's ok now according to :UP. So that part seems to work as intended.

larsbrinkhoff commented 7 years ago

But there are periodic seizures. I caught one in the act now. Typing in supdup was unresponsive until an ITS revived popped up.

So indeed something is hogging the CPU for a while. More bebugging needed.

b4 commented 7 years ago

Interesting.

I should either be totally functional, or completely broken...wonder how/when/where the lookup failed.

Sent from my iPhone

On Feb 23, 2017, at 22:10, Lars Brinkhoff notifications@github.com wrote:

I see a single "CHIP sj.gewt.net => error Host lookup failure" on the console. But it's ok now according to :UP. So that part seems to work as intended.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

larsbrinkhoff commented 7 years ago

Maybe not a fault at your end. I heard there are many nodes in this Arpa... I mean "internet" thing, and any one of them can break at any time. :-)

b4 commented 7 years ago

We need an IMP-to-IMP network for fun ;)

Does anything Butterfly survive?

Sent from my iPhone

On Feb 23, 2017, at 22:18, Lars Brinkhoff notifications@github.com wrote:

Maybe not a fault at your end. I heard there are many noded in this Arpa... I mean "internet" thing, and any one of them can break at any time. :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bictorv commented 7 years ago

Well, if you're doing the DNS lookup as part of an I/O instruction, that instruction will possibly take a long time, which ITS will discover ("revived"). You probably need to do it asynchronously, and perhaps only the first time - i.e., first parse the CHIP config, try to resolve the names, if that fails then mark the CHIP entry so that it gets resolved on demand - but once it has been successfully resolved, save the result rather than doing it again.

larsbrinkhoff commented 7 years ago

It's all done in dpchudp.c. I thought that was a separate process?

If we're going to add this, I'd like to include the use case that hostnames can change IP numbers. It's not that much more work.

Rhialto commented 7 years ago

I didn't look at the code, but does it use the modern lookup functions that transparently work for IPv6 as well? -- Sent from my Android device with K-9 . Please excuse my brevity.

bictorv commented 7 years ago

Lars, you're right; I was guessing. But I suppose doing DNS requests at nice=-20 could still have an impact?

larsbrinkhoff commented 7 years ago

I don't know which functions work with IPv6 or not. It seems to me that the gethostname() interface should work with any kind of address. However, the ch11 implementation currently rejects anything but IPv4 addresses.

I saw gethostbyname() is obsolete and getadrinfo() is the recommended replacement.

larsbrinkhoff commented 7 years ago

The connection between UP and NO has been down for a while. :UP doesn't get a response from either side, even though SJ responds.

Now NO went down without any kind of BUGHLT. Maybe KLH10 crashed? It did print

dpimp-R: Too-large packet (1480) , can't fragment yet

a few times, but that doesn't look related to Chaosnet.

Rhialto commented 7 years ago

On Fri 24 Feb 2017 at 00:44:47 -0800, Lars Brinkhoff wrote:

I saw gethostbyname() is obsolete and getadrinfo() is the recommended replacement.

Yes, the nice thing about getaddrinfo() is that it gives you the address in a generic format that does not depend on the address/protocol family. (It's a pity that they didn't do that already with gethostbyname(), since they could have, really, since an address to give to socket() etc has always been a size + a buffer; they just packaged it a bit more conveniently now)

You can hand it to socket() etc without thinking. Making it not work for IPv6, or any other address family that it happens to give you, takes almost more work than just supporting everything. The easy way to support tunneling CHAOS over DECnet! The manpage contains nice examples of client and server side use.

-Olaf. -- ___ Olaf 'Rhialto' Seibert -- Wayland: Those who don't understand X \X/ rhialto/at/xs4all.nl -- are condemned to reinvent it. Poorly.

larsbrinkhoff commented 7 years ago

I have restarted into a stock KLH10 to test CFTP performance without dynamic CHIP. It's obvious that contiuously resolving hostnames degrades performance.

Now I get 300-500 KBaud, and no "ITS revived" time lapse messages.

Rhialto commented 7 years ago

Is this branch in a state for merging? Or maybe more to the point, does it currently fix the issue of $TITLE, "Chaosnet fails if any chip fails"? (Any IPv6 issues are less important than that I'd say.)

larsbrinkhoff commented 7 years ago

It's not in a state for merging.

It did fix the issue, in the sense that all hostnames are looked up dynamically. If any "chip" fails, the code will proceed anyway, and the correct IP can be picked up later. However, hostname resolution sometimes takes a long time. I ran this with ITS for a while, and it was obvious that functionality was degraded.

My intent is to put hostname resolution in a separate process.

larsbrinkhoff commented 5 years ago

Sorry, kind of dropped the ball on this one.

bictorv commented 5 years ago

Please don't do this. Use the chaosnet bridge instead, which does periodic reparsing and a million more tricks. Also, the files "dpchudp." were renamed "dpchaos." a while ago (when Chaos-over-Ether was implemented).

bictorv commented 5 years ago

To be more helpful, I hope: as a small example, UP is configured using devdef chaos ub3 ch11 addr=764140 br=5 vec=270 myaddr=3143 chudpport=42043 chip=3040/localhost:42042 and my cbridge uses link chudp 127.0.0.1:42043 host 3143

Then all routing, reparsing, IPv6, TLS, Unix sockets etc can be handled by cbridge, and ITS can keep running even after changes to the network.

larsbrinkhoff commented 5 years ago

I agree using cbridge is much preferrable. I do so myself.

There could be a use case for this if someone for some reason doesn't want to use cbridge. But I'm not pursuing this now.

bictorv commented 5 years ago

Also without your own cbridge, a single chip entry for router.aosnet.ch suffices. devdef chaos ub3 ch11 addr=764140 br=5 vec=270 myaddr=XXXX chip=3040/router.aosnet.ch