devhawala / dodo

Xerox Network Services (XNS) implemented in Java
Other
13 stars 1 forks source link

6085 sends boot packet but is stuck at MP 0149 #1

Open techfury90 opened 3 years ago

techfury90 commented 3 years ago

Just tried to configure Boot Service for my Daybreak, extracted all the files from the disks, added the following to dodo.properties:

startBootService = true
bootService.baseDir = bootsvc
bootService.verbose = true

However, when pressing F3 at the boot soft keys, I see:

boot-server-packet IDP[ chksum: 0x9F58 len: 38 transportControl: 0 packetType: 9 [BootServerPacket] dst: 0000.FFFFFFFFFFFF.000A src: 0000.AA4119B7.0040  ]
data: 0x 0001 0000 AA00 0800 0000 0000
## bootSvc: processing simple-request from 0x0000AA4119B7 for boot-file-number 0x0000AA000800

It appears EtherInitialDove.db is sent over ethernet, but yet the 6085 never seems to "see" it and just stays at MP 0149. Do you have any idea what might be the culprit here?

devhawala commented 3 years ago

Hello techfury90,

sorry for the delay, i'm currently in holidays and mostly offline.

Wow, you are probably the first person who tries to use the boot service with a real machine. It was however already used in networks with real Xerox machines...

As the Dodo boot service receives the request, your network connection from the 6085 hardware to the bootserver through the NetHub seems to work OK (congratulations!). If there are no "## error ..." like messages in the log file, the boot service will have found the file to send, so this also should not be an issue.

So far i tested the boot service only with my own 6085 emulation (Draco), which is not a real test of the protocol implementations, as two emulations are involed which were not checked against real hardware (i do not have an 8000 server or 6085 machine), so some aspects were implemented "from memory" or "by gessing". Furthermore Draco does not use microcode, so network bootstraping starts with loading the germ from net and then continues with bootfile loading. So i thought that both boot protocol variants (simple and spp) were sufficently exercised, as loading anything before the bootfile (microcodes and germ) uses the simple boot protocol and bootfile loading uses the spp boot protocol. Starting the bootstrap sequence with the initial micrococde was never tested...

The disadvantage of the simple protocol is that there is no handshake: the boot service sends the packets blindly, without getting acknowledge packets. In the more advanced spp protocol variant, the workstation sends back an acknowledge for each packet, so there is an automatic throttling (the option for resending packets is currently not supported by my boot service). Implementing spp was probably too complex for the initial bootstrap modules of real 6085's (IOP ROM code, initial microcode), so the simple protocol is used there.

I suppose that the problem is in the speed factor: the Dodo boot service sends 50 packets per second with the simple boot protocol. My Draco emulator has no problem with this, as the requesting part for the germ is in the emulator (so Java and not yet the Mesa machine emulation). But a real 6085 may be too slow for 50 packets per second...

When i'm back on my development machine next week, i will add an config option for the packet delivery speed for the simple protocol to replace the hard-coded value

Meanwhile: did you check with the NetSpy program that the boot service sends the packets for the initial microcode? Can the real 6085 access the other Dodo services?

Greetings, Hans

techfury90 commented 3 years ago

No worries, Hans. Enjoy your vacation! 👍

I had a similar hypothesis that perhaps adjusting the delay would work, but trying all sorts of values between the original 20ms pause at line 374 of BootResponder.java to 5000ms had no effect. However, I noticed that Interlisp-D has a Boot Service, with the source code included! I don't quite know how to make sense of this, but it says this:

             (with SPPSTREAM (SETQ OUTPUTSTREAM (SPPOUTPUTSTREAM INPUTSTREAM))
                    (with SPPCON SPP.CONNECTION          (* Switch to negotiated connection 
                                                           id)
                           (SETQ SPPDESTID (with ETHERBOOTPACKET PACKET ETHERBOOTSPPDESTID)))
                                                             (* Send SYS packet to establish
                    (\SPP.SENDPKT SPP.CONNECTION (\SPP.SYSPKT SPP.CONNECTION 0)))  
                                                         (Dove fix))

I'll see if I can figure out what it's talking about and see if I can implement that same fix. I think that's the issue here, having just looked at the Interlisp code this afternoon.

By the way, all other Dodo services work with ViewPoint 2.0.5 on my 6085. It does seem to have to retransmit packets extremely frequently, probably due to sending them too fast. May be wise to add some sort of general send delay option as well...

Edit: Lisp source code below etherboot.txt

Edit 2: seems that the 6085 actually sends a simple request, but then some sort of handshake is expected to switch it over to SPP? Interesting...

devhawala commented 3 years ago

Glad to hear that your 6085 hardware machine can use the Dodo services. For the resends, there is already an undocumented (low-level) configuration parameter "spp.sendingTimeGap", which specifies the minimum time interval in milliseconds between 2 packets of the same SPP connection, default is 20 ms, allowing for 50 packets per second. Maybe a value of 50 is good starting point for experimenting (20 packets => ~ 10 kbyte/s)?

For the boot protocols: microcode and germ are loaded with the simple protocol, then the IOP starts the germ which uses the spp protocol for loading the Pilot boot file. So your 0149 problem is still the simple boot protocol, as the initial microcode is being loaded (which will load the mesa microcode and then the germ). I did read somewhere that there was a protocol change for SPP when the 8090 servers appeared, involving an additional system packet to initiate an spp connection. As all tested Pilot versions from XDE 5.0 (Pilot 12.3) up to GVWin (Pilot 15.3) work with Dodo, i must somehow handled this extra packet gracefully without noticing...

But as the simple boot protocol seems to fail somehow, i will have a closer look at it next week. After all, if it is buggy or wrong, i may have implemented a bug-compatible net bootstrapper in Draco, so the problem may be deeper or well hidden.

(i did look at your attached Lisp ile, but i don't understand Lisp, this is still on my wish list :-) )

BTW: if you have some old Lisp source files, do you also have (by pure coincidence) some Mesa sources from the Dandelion era? I'm currently looking at extending my Mesa emulator to also support Dandelion virtual hardware (like Draco allows to run 6085 some software and disk images), but many disk related things are still unclear, even after extending the Darkstar emulator with special traces to see how the Pilot Mesa code interacts with the disk microcode. So far my DLion boot ends in MP 0921 (boot device error). Some Dandelion sources (DLionInputOuput, DiskHeadDLion or the like) could be helpful there...

However: greetings from Berlin/Germany, Hans

techfury90 commented 3 years ago

Experimenting with spp.sendingTimeGap while loading an Interlisp-D sysout found that while 50 still resulted in retransmitted packets, 75 does not appear to. I will mess with this further and see if there's an optimum value.

Ahhh, I see, so both protocols are used, however we are not in the SPP phase yet. This makes sense.

That makes two of us who don't really grok Lisp. That said, I have the documentation that goes with that code, and I understand that enough to configure it. I only have one 6085 here, and I could never get Darkstar's ethernet to work right... so I think what I'll try later today or tomorrow is configuring the boot service on Interlisp-D and sending a forged simple boot request to it, capture the data, and compare to what Dodo is sending. Perhaps it will be readily apparent once that's done...

No "special" source over here- I grabbed that off of some archive of Interlisp-D code I grabbed off of CMU's lisp archive. That said, I do know the author of Darkstar fairly well, so I will forward your question along to him. I recall him saying the disk was one of the most aggravatingly troublesome parts of the entire emulator...

Edit: Actually, wait, did you say Draco has a net bootstrapper? I did not realize that. That would make sending a boot request to Interlisp-D considerably easier...

techfury90 commented 3 years ago

Got the Interlisp-D boot service set up and started, but it does not respond at all to Draco's boot request, which looks like this:

[10] => packet length: 52 -- at 2594368482.438861 ms

          => raw packet content:
              0x000 : FFFF FFFF FFFF 0000 AA00 AB21 0600 FFFF 0026 0009 0000 0000 FFFF FFFF FFFF 000A
              0x010 : 0000 0000 0000 AA00 AB21 04D2 0001 0000 AA00 081F

          => ethernet packet header
              dst-addr : FF-FF-FF-FF-FF-FF
              src-addr : 00-00-AA-00-AB-21
              ethType  : 0x0600 (xns)

          => xns packet header
              ckSum   : 0xFFFF
              length  : 38 bytes => 19 words
              transCtl: 0
              pktType : 9 = BootServerPacket

          => xns destination
              network : 0000-0000
              host    : FF-FF-FF-FF-FF-FF
              socket  : 000A - boot

          => xns source
              network : 0000-0000
              host    : 00-00-AA-00-AB-21
              socket  : 04D2

Compare to my real 6085:

[11] => packet length: 60 -- at 2594401347.603955 ms

          => raw packet content:
              0x000 : FFFF FFFF FFFF 0000 AA41 19B7 0600 9F58 0026 0009 0000 0000 FFFF FFFF FFFF 000A
              0x010 : 0000 0000 0000 AA41 19B7 0040 0001 0000 AA00 0800 0000 0000 0000 0000

          => ethernet packet header
              dst-addr : FF-FF-FF-FF-FF-FF
              src-addr : 00-00-AA-41-19-B7
              ethType  : 0x0600 (xns)

          => xns packet header
              ckSum   : 0x9F58
              length  : 38 bytes => 19 words
              transCtl: 0
              pktType : 9 = BootServerPacket

          => xns destination
              network : 0000-0000
              host    : FF-FF-FF-FF-FF-FF
              socket  : 000A - boot

          => xns source
              network : 0000-0000
              host    : 00-00-AA-41-19-B7
              socket  : 0040

Next plan: craft some sort of simple program to generate a boot request of the same format as my 6085 to see how the Interlisp-D boot service responds...

devhawala commented 3 years ago

Argh, forgot this one...

There is a minimum (but undocumented) length for ethernet packets for being accepted by Xerox machines: 60 bytes. Dodo ensures that all outgoing packets have at least this length, filling with zeros and setting the length for the raw ethernet packet. BUT: my boot file requestor (and probably the acknowledge packets for spp, must check) in Draco does not handle this minimal length, using the real payload length... This is ok for Dodo which also accepts smaller packets, but the low-level ethernet implementation in Lisp high probably checks the minimal length and the Lisp boot service will simply not see the packets sent by Draco when requesting the germ... This explains the first difference consisting of the longer packet (60 and not 52 bytes) and the 4 additional 0-words sent by the real 6085.

The further differences in the packet seem ok: of course different machine IDs for the sender, and Draco does not need microcode, so directly requests the germ (0000 AA00 081F) instead of the initial microcode (0000 AA00 0800).

The list of things to check and fix gets longer and longer... So thanks for testing!

Regards, Hans

techfury90 commented 3 years ago

Aha! Armed with that information, I have patched Dwarf to send the proper 60 byte packet, as we see below:

[1138] => packet length: 60 -- at 2659998545.322396 ms

          => raw packet content:
              0x000 : FFFF FFFF FFFF 0000 AA00 AB21 0600 FFFF 0026 0009 0000 0000 FFFF FFFF FFFF 000A
              0x010 : 0000 0000 0000 AA00 AB21 04D2 0001 0000 AA00 081F 0000 0000 0000 0000

          => ethernet packet header
              dst-addr : FF-FF-FF-FF-FF-FF
              src-addr : 00-00-AA-00-AB-21
              ethType  : 0x0600 (xns)

          => xns packet header
              ckSum   : 0xFFFF
              length  : 38 bytes => 19 words
              transCtl: 0
              pktType : 9 = BootServerPacket

          => xns destination
              network : 0000-0000
              host    : FF-FF-FF-FF-FF-FF
              socket  : 000A - boot

          => xns source
              network : 0000-0000
              host    : 00-00-AA-00-AB-21
              socket  : 04D2

Now to get Interlisp-D started up and its boot service running to see what happens...

And no problem! I'm just glad someone actually reimplemented all this stuff so someone like myself who was born after the 6085 came out can actually see the marvel of XNS on ViewPoint.

Edit: Oops, pasted the wrong packet...

techfury90 commented 3 years ago

Still does not elicit a response from Interlisp-D. I noticed two differences in the packets from my modified Dwarf and my 6085: the 6085 appears to be computing a checksum for boot packets, whereas Dwarf just fills the field with all ones; and the socket number is 0x04D2, compared to 0x0040 on the real 6085. I have since modified Dwarf to use the 0x0040 socket number, but it would appear that Interlisp-D is probably rejecting the packets because of an incorrect checksum. Let me get this modified to compute the checksum and see what happens...

devhawala commented 3 years ago

The port 0x04D2 should not matter, as it is the sender port, which can be freely chosen by the sending machine.

The checksum 0xFFFF explicitly means "no checksum", which should make the receiver accept the packet without comparing checksums. So i think the boot request packet is ok now. But maybe Interlisp simply requires checksums?

techfury90 commented 3 years ago

Yeah, I also noticed that was a reserved value that indicates no checksum after writing my comment. Unfortunately, I can't test it yet, because my 6085's power supply just died. Will be taking my 6085 to a power supply expert on Tuesday. He thinks it won't be much trouble for him to repair, thankfully.

devhawala commented 3 years ago

Ouch, these machines have their age. At the Technical University of Berlin (one of the Xerox University Grant 3rd wave sites in Europe in ~1986), most hardware failures we had were on the screens and sometimes on the IOPs. I truly hope your machine is from a good batch...

Meanwhile i checked in new versions for Dodo and Dwarf to the Github repos. Dodo now has the option 'bootService.simpleDataSendInterval' for throttling the number of packets and should now transfer complete microcode files (instead of stopping at the last 512 bytes boundary). Dwarf now sends the germ request with a 60 bytes ethernet packet.

techfury90 commented 3 years ago

Yeah, this one actually had IOP problems when the seller was getting it ready to ship, so he swapped out the IOP for a newer revision... fortunately the 19" screen hasn't been a problem, just has the geometry off in one corner. My friend is also going to fix that tomorrow...

Just got the car ready to transport my 6085, have to drive 250km each way tomorrow to visit my friend. Luckily, he's the kind of savant who can fix any switch mode PS just by looking at it. Bonus: he only asked that I give him a comprehensive demonstration of the 6085's capabilities instead of giving him money.

Excellent news on the new changes! I will give them a try as soon as we have my 6085 back to life. Here's hoping for the best! Thanks a lot, Hans!

techfury90 commented 3 years ago

Good news: PSU expert believes the problem is simply a faulty $0.47 Zener diode. I have a replacement diode arriving tomorrow, here's hoping it brings my 6085 back to life so I can test more!

techfury90 commented 3 years ago

Didn't quite work, but long story short I'm finally back up. OK, so I grabbed the latest changes and we get further! Instead of hanging at 0149, we now hang at 0201, with the following dodo messages:

## bootSvc: processing simple-request from 0x0000AA4119B7 for boot-file-number 0x0000AA000800
## bootSvc: sending file 'EtherInitialDove.db' via simpleFile for boot-file-number 0x0000AA000800
## bootSvc - simpleData: at +42 sent 512 bootfile bytes
## bootSvc - simpleData: at +85 sent 512 bootfile bytes
## bootSvc - simpleData: at +127 sent 512 bootfile bytes
## bootSvc - simpleData: at +169 sent 444 bootfile bytes
## bootSvc - simpleData: at +210 sent 0 bootfile bytes - transfer done
boot-server-packet IDP[ chksum: 0xA469 len: 40 transportControl: 0 packetType: 9 [BootServerPacket] dst: 0000.1000BB101101.000A src: 0000.AA4119B7.0040  ]
data: 0x 0003 0000 AA00 0810 0622 0000
## bootSvc: processing SPP-request from 0x0000AA4119B7 connID 0x0622 for boot-file-number 0x0000AA000810
## bootSvc: created session key 0x0622719B for request
## bootSvc - reply SPP: srcAdr=1000BB101101 dstAdr=0000AA4119B7 srcId=719B dstId=0622 system sendAck sst=0 seqNo=0 ackNo=0 allocNo=0 dataLen=0

NetSpy results (truncated to only show packets exchanged after it changes to 0199):

[8] => packet length: 60 -- at 5147399979.364924 ms

          => raw packet content:
              0x000 : 1000 BB10 1101 0000 AA41 19B7 0600 B067 0028 0009 0000 0000 1000
BB10 1101 000A
              0x010 : 0000 0000 0000 AA41 19B7 0040 0003 0000 AA00 0810 0C21 0000 0000
0000

          => ethernet packet header
              dst-addr : 10-00-BB-10-11-01
              src-addr : 00-00-AA-41-19-B7
              ethType  : 0x0600 (xns)

          => xns packet header
              ckSum   : 0xB067
              length  : 40 bytes => 20 words
              transCtl: 0
              pktType : 9 = BootServerPacket

          => xns destination
              network : 0000-0000
              host    : 10-00-BB-10-11-01
              socket  : 000A - boot

          => xns source
              network : 0000-0000
              host    : 00-00-AA-41-19-B7
              socket  : 0040
[9] => packet length: 60 -- at 5147399993.074038 ms

          => raw packet content:
              0x000 : 0000 AA41 19B7 1000 BB10 1101 0600 9489 002A 0005 0000 041A 0000
AA41 19B7 0040
              0x010 : 0000 041A 1000 BB10 1101 000A C000 719E 0C21 0000 0000 0000 0000
0000

          => ethernet packet header
              dst-addr : 00-00-AA-41-19-B7
              src-addr : 10-00-BB-10-11-01
              ethType  : 0x0600 (xns)

          => xns packet header
              ckSum   : 0x9489
              length  : 42 bytes => 21 words
              transCtl: 0
              pktType : 5 = SPP

          => xns destination
              network : 0000-041A
              host    : 00-00-AA-41-19-B7
              socket  : 0040

          => xns source
              network : 0000-041A
              host    : 10-00-BB-10-11-01
              socket  : 000A - boot

          => SPP header
              TransCtl  : C0 ( SystemPacket SendAck )
              Data SST  : 00
              Source Id : 719E
              Dest Id   : 0C21
              SequenceNo: 0000
              Ack No    : 0000
              Alloc No  : 0000

          => xns SPP payload ( bytes: 0 => words: 0 )

Seems almost as if the SPP response is being missed by the 6085 end...

devhawala commented 3 years ago

Yes, it seems that the 6085 still waits for the SPP initiation packet sent by the boot service.

So i added the following: the SPP sender in the boot service now also waits 'bootService.simpleDataSendInterval' milliseconds before sending any reply packet, in the intention for the receiving machine to have enough time for accepting the packet. This effectively slows down the boot process of a network boot in the Draco emulator.

Hope that helps Hans

techfury90 commented 3 years ago

Great idea, but it sadly didn't seem to do the trick.

What I actually noticed after testing different delay values is that the 0201 MP code doesn't appear until after the SPP packet is sent, so I think we're looking at it being picky about the contents of some field. I need to look at that Lisp code again, maybe it has a hint for us...

devhawala commented 3 years ago

Well...

Next try: as the boot request packet for SPP already has the SPP connection-id for the sender side (the 6085 workstation in your case), it may be possible that the requestor assumes the connection to be already opened, so it does not expect a connect handshake system packet but directly the first data packet...?

To check this, i have an experimental version of Dodo where the 1st reply packet is a data instead of a handshake packet. The dist.zip attached contains the .jar file (jar files cannot be attached directly), if it works with your 6085, i will check in that version to github, possibly after removing the throttling of SPP packets.

Good luck Hans

techfury90 commented 3 years ago

Sadly, the same result, however, I have some observations from the Lisp code for you:

devhawala commented 3 years ago

ok... the attached dist.zip has a new version doing what the Lisp code does: for the one packet from the booting machine, it replies 2 packets, the first one being the system packet for the connection open handshake and then after 10 milliseconds the second packet with the first data chunk.

But i'm not sure how Lisp "created" the SPP stream instance, as the incoming boot request packet is not SPP, but something different (IDP with a specific packet type != SPP packet type), but that's a different story...

Hope that helps...