freifunkh / ansible

Here we store all Ansible roles and configs used for Freifunk Hannover.
MIT License
7 stars 3 forks source link

Difference supernodes and routers regarding ff02::2:1001 #195

Closed AiyionPrime closed 3 years ago

AiyionPrime commented 3 years ago

In gluon there's a package gluon-neighbour-info. When invoked correctly it does return the nodeinfo of the routers neighbours, as well as its own: gluon-neighbour-info -i mesh0 -d ff02::2:1001 -p 1001 -r nodeinfo On my cudy this yields three responses.

The package is relatively small, consisting of mostly one c file, which compiles fine under debian. For testing purposes I put it on sn09 under /root/aiyion_test/. When invoked on a supernode though, it does not yield anything. (the programs timeout is set to 3 seconds)

/root/aiyion_test/gluon-neighbour-info -i vx-14 -d ff02::2:1001 -p 1001 -r nodeinfo

I'm not sure, whats different there, but I find it irritating, that not even the supernodes meshannounce instance turns up. This is different, when ff02::2:1001 is swapped for one of the supernodes unicast adresses. In that case the supernode answers.

If anyone (possibly @lemoer or @Manawyrm ?) has an idea what I'm missing, I'd be happy for a hint.

Thanks, Aiyion

AiyionPrime commented 3 years ago

The not working unicast example: /root/aiyion_test/gluon-neighbour-info -i vx-14 -d 2a02:790:ff:914::9001 -p 1001 -r nodeinfo

1977er commented 3 years ago

I guess ebtables-filter-multicast is related as it might filter these multicasts from the supernode.

AiyionPrime commented 3 years ago

Though I liked the idea earlier, after reading their docs this does not seem to be the problem. Furthermore the supernode itself should be a result to the program nevertheless, but isn't.

The multicast packets are filtered between the nodes’ client bridge (br-client) and mesh interface (bat0) on output.

Originally documented in https://gluon.readthedocs.io/en/v2020.2.2/package/gluon-ebtables-filter-multicast.html

AiyionPrime commented 3 years ago

I furthermore found gluon doc regarding the matter, where ff02::2:1001 is supposed to not work anymore?

With Gluon v2019.1, nodes will not answer respondd queries on [ff02::2:1001]:1001 anymore. Respondd querier setups still using this address must be updated to the new address [ff05::2:1001]:1001 (supported since Gluon v2017.1). This change was required due to cross-domain leakage of respondd data.

Originally documented in https://gluon.readthedocs.io/en/v2019.1.x/releases/v2019.1.html

AiyionPrime commented 3 years ago

When invoked on a router with the correct multicast address on a vpn interface the router is the only result as well, though the supernode is running meshannounce on port 1001 and the specified multicast address:

https://github.com/freifunkh/ansible/blob/66d14a8703681df1dc9f2b0c63e0e55dd88375e6/roles/ffh.mesh_announce/templates/respondd.conf.j2#L6-L14

It behaves like the supernode was not subscribed to the multicast group, though meshannounce is configuring it.

AiyionPrime commented 3 years ago

When invoked using the interface bat0 of an uplink router the supernodes return their info properly.

AiyionPrime commented 3 years ago

And when invoked the same way on a supernode it only returns its own response.

AiyionPrime commented 3 years ago

I think this is buggery we produced with wireguard, as it does work with fastd as expected:

root@FFH-Burg-LTE:~# gluon-neighbour-info -i mesh-vpn -p 1001 -d ff02::2:1001 -r nodeinfo

produces two results, the router itself, as well as the current supernode sn01.

@lemoer, I think we should get this straight.

AiyionPrime commented 3 years ago

There are then two steps to resolve this:

AiyionPrime commented 3 years ago

ssh -p 1337 root@sn01.s.ffh.zone tcpdump -i mesh_fastd_18 -U -s0 -w - 'udp and not port 22' | wireshark -k -i - and on the corresponding router (FFH-Burg-LTE) the above line yields two packets one asking for nodeinfo, the other being the response of 808 bytes containing the requested data.

AiyionPrime commented 3 years ago

The supernodes are expected to behave the same way; so sn09 for my wireguard uplink should produce the same result, an indeed the packet is received py the supernode measured the same way (for wireguard):

ssh -p 1337 root@sn09.s.ffh.zone tcpdump -i vx-14 -U -s0 -w - 'udp and not port 22' | wireshark -k -i - and on the router gluon-neighbour-info -i vx_vpn_wired -p 1001 -d ff02::2:1001 -r nodeinfo

So apparently the supernodes just decide to not answer it, though the packet is received.

AiyionPrime commented 3 years ago

The first problem is related to mesh-announces DomainRegistry, as vx-nm and vlan-gt-nm are not part of its devices. Hacking them in manually would likely work, but finding what feeds it might be worthwhile.

AiyionPrime commented 3 years ago

The first issue is resolved, wireguard routers now get a response from supernodes. In turn their mac is resolved to hostname properly on the routers status page:

image

AiyionPrime commented 3 years ago

This is, what we currently allow regarding port 1001 (respondd)

root@aiyion-JT-OR750i:~# ip6tables-save | grep 1001


-A zone_loc_client_input -s fe80::/64 -p udp -m udp --dport 1001 -m comment --comment "!fw3: client_respondd" -j ACCEPT
-A zone_mesh_input -s fe80::/64 -p udp -m udp --sport 1001 --dport 32768:61000 -m comment --comment "!fw3: mesh_respondd_reply" -j ACCEPT
-A zone_mesh_input -s fe80::/64 -p udp -m udp --dport 1001 -m comment --comment "!fw3: mesh_respondd_ll" -j ACCEPT
-A zone_mesh_input -s fdca:ffee:8:14::/64 -p udp -m udp --dport 1001 -m comment --comment "!fw3: mesh_respondd_siteprefix" -j ACCEPT
-A zone_wan_input -s fe80::/64 -p udp -m udp --sport 1001 --dport 32768:61000 -m comment --comment "!fw3: wan_respondd_reply" -j ACCEPT
-A zone_wan_input -s fe80::/64 -p udp -m udp --dport 1001 -m comment --comment "!fw3: wan_respondd" -j ACCEPT```
AiyionPrime commented 3 years ago

Tested on a fastd router, the node receives the request via mesh_vpn and answers accordingly. Whatever the supernode blocks away then is a udp response on the same port the request went out (something betwen 40K and 59K+)

AiyionPrime commented 3 years ago

Same appiies to wireguard nodes and their vx interface. The problem seems to be on the answer receiving end of a supernode...

AiyionPrime commented 3 years ago

The problem should be inspected on sn09 or after #197 was merged and applied to the other supernodes...

AiyionPrime commented 3 years ago

I think what we're missing the pendant of this line on supernodes:

-A zone_mesh_input -s fe80::/64 -p udp -m udp --sport 1001 --dport 32768:61000 -m comment --comment "!fw3: mesh_respondd_reply" -j ACCEPT

This should become part of the mesh-announce ferm config.