Netatalk / netatalk

Netatalk is a Free and Open Source AFP fileserver. A *NIX or BSD system running Netatalk is capable of serving many Macintosh clients simultaneously as an AppleShare file server.
https://netatalk.io
GNU General Public License v2.0
338 stars 86 forks source link

[2.3] GS/OS "Split Horizon" patch breaking atalkd broadcasting RTMP data to other routers #585

Closed NJRoadfan closed 10 months ago

NJRoadfan commented 10 months ago

The following commit is causing problems when using other routers with netatalk's atalkd and should be reverted: https://github.com/Netatalk/netatalk/commit/27111c653b471eab31ac2a776e0e483f9194f805

With this patch applied, atalkd stops broadcasting RTMP routing tuples to other AppleTalk routers on the network. As a result, clients behind other routers on the network can't see or communicate with other network segments atalkd is routing. It should be noted that the AsanteTalk and likely the Dayna bridges mentioned are not "by the book" AppleTalk routers and are known to suffer from bizarre behavior. This patch was meant to workaround their buggy behavior, but broke other AppleTalk routers in the process.

rdmark commented 10 months ago

@NJRoadfan Good catch. In hindsight that was an obviously risky code change.

Do you think it's worth maintaining the split horizon patch as a quirks mode code path, for AsanteTalk users? Controlled either by a run time option (atalkd.conf?) or configuration flag and ifdef macro thing... (unless there's a better way to detect dynamically how it should behave.)

NJRoadfan commented 10 months ago

I don't own an AsanteTalk or Dayna bridge to trace what they attempt to do on power-on.

I'd have to trace the code in atalkd, but I'm guessing this patches suppresses RTMP broadcasts of routing data on all interfaces, the packets of which are likely confusing the bridges and causing them to lock-up. I might contact the author of this patch and see if they have any details. I suspect this patch was never tested on a multi-router network with atalkd also routing multiple interfaces itself. I only came across this problem due to testing other AppleTalk routers alongside netatalk.

The split horizon method is detailed on page 5-11 in "Inside AppleTalk" if curious.

NJRoadfan commented 10 months ago

Did some digging in usenet archives. Apparently the Dayna EtherPrint bridges like to crash GS/OS when netatalk's atalkd is broadcasting certain RTMP packets. I suspect the problem was occurring for Carmeron Kaiser @classilla here: http://oldvcr.blogspot.com/2020/11/not-refurb-weekend-apple-iigs.html Although back in 2003 he said it was working "no problem". Another report from @IvanExpert back during A2SERVER development states that one can't open up the AppleShare control panel in GS/OS without the patch. Ref: https://groups.google.com/g/comp.sys.apple2/c/PAeirVjI9UI/m/wwhKpIwVnjsJ

As for the AsanteTalk, apparently it has a very weird startup routine to determine whether or not to run the Ethernet port in Phase 1 or Phase 2. That and it seems to crash GS/OS like the Dayna bridges when its actually working. Without either of these I can't test or packet sniff to see what it up with these bridges. So far the IIgs seems very well behaved with Shiva/Kinetics Fastpath routers and with the TashTalk Localtalk adapter being driven by a software bridge.

rdmark commented 10 months ago

@NJRoadfan May I ask you submit a PR with the revert and we can start with that. The default behavior should definitely not be quirks mode.

If you find out more from Steven, and get some insights into how whether it's feasible to support AsanteTalk quirks mode in a more portable way, let's follow up in another PR at that point.

Does this sound like a good approach?

BTW, the only Asante product I have is Micro AsantePrint. I haven't tested it yet since the power brick is in bad shape...

IvanExpert commented 10 months ago

Nice research.

Yeah, my memory and documentation says that:

I probably have extras of some of the above if you'd like me to send you one and it would help. I definitely have a couple Daynas, possibly Asante as well, and even the things I have only one of I'm happy to loan you.

NJRoadfan commented 10 months ago

See #596.

@IvanExpert: The FastPath 4 and 5 both work without issue. Like the GatorBox, their behavior matches what is found in "Inside AppleTalk" regarding routers. These were enterprise class devices when new and "had to work".

The AsanteTalk seems like a very "special" device. Even 30 years ago its firmware should have only supported Phase 2 mode! EtherTalk Phase 1 had a very short lifespan on the market before Apple replaced it.

IvanExpert commented 10 months ago

The AsanteTalk seems like a very "special" device. Even 30 years ago its firmware should have only supported Phase 2 mode! EtherTalk Phase 1 had a very short lifespan on the market before Apple replaced it.

For sure! At any rate, you'd think it should default to Phase 2, not Phase 1.

With that said, anecdotally speaking, it seems as though there are many more Asante bridges out there than the other brands. Just did an eBay search and while there are a couple of Dayna and Farallon, there are many more Asante currently listed. They were also still available for purchase from Asante's website a decade ago when I made A2SERVER.

@rdmark I think the Micro AsantePrint is more or less the same box as the AsanteTalk, but I'm not sure.

I would argue that if it has to be one or the other, "quirks mode" is more desirable, given the popularity of the AsanteTalk boxes, and to a lesser extent, the Dayna boxes. There are certainly many more of those than there are Gatorboxes and Fastpaths -- these almost never appear on eBay (I have automatic searches for them and it's like 1 to 3 per year, at most). I realize that "quirks" mode is wrong, but I think accomodating what people actually have in 2023 is the greater good here.

NJRoadfan commented 10 months ago

@IvanExpert I came across this problem while researching modern alternatives to the Asante and Dayna boxes. I was experimenting with TashRouter, which allows one to seamlessly bridge LToUDP and TashTalk (a LocalTalk RPi hat interface) networks to netatalk. Once setup, one could plug an AirTalk into an Apple IIgs and instantly be able to netboot, print, and file share....wirelessly!

IvanExpert commented 10 months ago

@NJRoadfan Ah, got it. I mean, yep, that's compelling, though the Tashtalk, at least for now, is limited to those who have the skills to solder it together, and it can only be ordered when the kits are available, and it's limited to Raspberry Pi users of A2SERVER, and not any other kind of hardware. So, I'd still argue it's easier to get an AsanteTalk, even though I like the idea of something more modern, and the Tashtalk is super cool (I ordered two when I could, and paid him to assemble them for me, since, well, I'm better with software than hardware). I'd personally vote for the TashRouter software to instead do whatever it does in such a way as to maintain compatiblity with the patch that lets Asante and Dayna units work, so as to let A2SERVER be used by the greatest number of people—though I can certainly see the argument against that, as it would be an ugly hack.

I suppose one possibility, though also ugly, would be to provide a separate binary for each use case, and prompt the user during installation which LocalTalk hardware they intend to use A2SERVER with (or, if being compiled from source during installation, apply or revert the patch, as my original A2SERVER scripts did, or have it build both binaries).

rdmark commented 10 months ago

@IvanExpert

I'd personally vote for the TashRouter software to instead do whatever it does in such a way as to maintain compatiblity with the patch that lets Asante and Dayna units work

While I get your argument about being compatible with popular hardware, I would lean towards having the default netatalk behavior follow the AppleTalk specification. If I understand @NJRoadfan 's research correctly, the patch makes it so that no netatalk traffic is routed to network segments under different routers, regardless of whether Tash is used or some other AppleTalk router, so just modifying the Tash router behavior wouldn't be enough.

I suggest we introduce either a compile time option (e.g. --enable-appletalk-quirks) or a flag in atalkd.conf (e.g. -asante-quirks).

IvanExpert commented 10 months ago

@rdmark I can't argue with having the software follow the spec. (I also have to remind myself that we are talking about about Netatalk here, not A2SERVER.)

And I think your suggestion is the most correct option. I think having a flag for atalkd.conf, rather than a compile-time flag, would be the way to go, as it would allow an existing installation to change when a user's hardware changes.

rdmark commented 10 months ago

@IvanExpert I agree that a runtime parameter would be ideal. I'm a bit wary of messing with the dark magic that is the atalkd.conf autoconfiguration logic.

Another option would be to add a command line parameter to the atalkd daemon, e.g. atalkd --router-quirks (I haven't landed on the best way to name this option yet)

rdmark commented 10 months ago

@IvanExpert @NJRoadfan Here's a proof-of-concept for a runtime quirks mode: https://github.com/Netatalk/netatalk/pull/597

Start atalkd with atalkd -q to enable quirks mode. Please test that it actually results in the wanted behavior when enabled and disabled. I don't have an environment to test in...

IvanExpert commented 10 months ago

I'll test this weekend when I have my hardware. I even have a Gatorbox somewhere so I could probably test both modes, as well as a TashTalk, but I'm not sure what the failure mode when quirks is enabled looks like.

NJRoadfan commented 10 months ago

In order to test, you'll need to run two separate networks from atalkd. The issue I was having is atalkd would not rebroadcast router data from other routers on the network. Consider the following network:

         1                      2-2                eth0               eth1    3-3
     +-----------(GatorBox)-----------+--------------(netatalk atalkd)--------------+
     |[Localtalk]          [Ethernet] |                                             |
     |                                |                                             |
+---------+                     +---------+                                +---------+
|Machine 1|                     |Machine 2|                                |Machine 3|
+---------+                     +---------+                                +---------+

With "quirks mode" enabled, Machine 1 couldn't see Machine 3 and all devices on that network. This is because atalkd was not rebroadcasting routing tuples for network 1 over eth1. Machine 2 and Machine 3 could see each other because atalkd always broadcasts all its routing data to all the networks its connected to.

Debugging Tip: You can view atalkd's current routing table in /proc/net/atalk/route You should be able to view the GatorBox's current routing table from it's software, but I don't know the specifics of that device.

rdmark commented 10 months ago

@NJRoadfan Did I capture the situation accurately in the man page in my PR? Anything else we should add to the documentation for posterity?

NJRoadfan commented 10 months ago

Not quite. Machine 1 along with everything else in Network 1 in the above diagram can talk to the netatalk box. The issue is atalkd isn't sending out RTMP packets to tell the GatorBox how to reach Network 3-3 and Machine 3, thus the GatorBox's routing tables are never updated.

rdmark commented 10 months ago

Pushed an update with tweaked wording of the man page text.

NJRoadfan commented 10 months ago

Testing completed. This patch appears to be working as expected. I'm not 100% comfortable with the changes as it could cause undesired behavior on networks and result in needless troubleshooting (it is a crufty hack at the end of the day). At least the default behavior for atalkd is now consistent with the reference implementation of AppleTalk.

I would like to eventually revisit this, figure out why these consumer bridges fail to work properly and implement a proper fix.

rdmark commented 10 months ago

Let me put a more strongly worded warning in the manual, and a TODO in the source code.

rdmark commented 10 months ago

Touched up the man page and code comments, and merged to the 2.3 branch.

Let's close this one out, and create a new ticket once we have a lead on a better solution.

Thanks for the great discussion!