Open mweinelt opened 1 year ago
I flashed our current nightly based on gluons master; the problem persists. This meshlink should not be there.
On the ERX I performed the following actions:
root@erx-bridge-port-isolation-debugger:~# opkg update; opkg install ethtool
root@erx-bridge-port-isolation-debugger:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1504 qdisc fq_codel state UP qlen 1000
link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link
valid_lft forever preferred_lft forever
3: eth0@dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-wan state UP qlen 1000
link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
4: eth1@dsa: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-mesh_other state LOWERLAYERDOWN qlen 1000
link/ether b4:fb:e4:53:ae:e9 brd ff:ff:ff:ff:ff:ff
5: eth2@dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mesh_other state UP qlen 1000
link/ether b4:fb:e4:53:ae:ea brd ff:ff:ff:ff:ff:ff
6: eth3@dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mesh_other state UP qlen 1000
link/ether b4:fb:e4:53:ae:eb brd ff:ff:ff:ff:ff:ff
7: eth4@dsa: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-mesh_other state LOWERLAYERDOWN qlen 1000
link/ether b4:fb:e4:53:ae:ec brd ff:ff:ff:ff:ff:ff
8: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether fa:1d:95:4a:38:53 brd ff:ff:ff:ff:ff:ff
9: teql0: <NOARP> mtu 1500 qdisc noop state DOWN qlen 100
link/void
11: br-wan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
inet 192.168.178.63/24 brd 192.168.178.255 scope global br-wan
valid_lft forever preferred_lft forever
inet6 2a02:560:529b:8c00:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 6915sec preferred_lft 3315sec
inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link
valid_lft forever preferred_lft forever
12: local-port@local-node: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client state UP qlen 1000
link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
13: local-node@local-port: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 16:41:95:40:f8:dc brd ff:ff:ff:ff:ff:ff
inet 10.14.0.1/16 brd 10.14.255.255 scope global local-node
valid_lft forever preferred_lft forever
inet6 fdca:ffee:8:14::1/128 scope global deprecated
valid_lft forever preferred_lft 0sec
inet6 fe80::1441:95ff:fe40:f8dc/64 scope link
valid_lft forever preferred_lft forever
14: br-client: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
inet6 2a02:790:ff:114:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 276sec preferred_lft 126sec
inet6 2a02:790:ff:414:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 291sec preferred_lft 141sec
inet6 2a02:790:ff:714:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 273sec preferred_lft 123sec
inet6 2a02:790:ff:914:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 270sec preferred_lft 120sec
inet6 2a02:790:ff:1014:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 239sec preferred_lft 89sec
inet6 2a02:790:ff:514:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 277sec preferred_lft 127sec
inet6 2001:678:978:214:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 291sec preferred_lft 141sec
inet6 fdca:ffee:8:14:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute
valid_lft 7170sec preferred_lft 120sec
inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link
valid_lft forever preferred_lft forever
15: bat0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client state UNKNOWN qlen 1000
link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link
valid_lft forever preferred_lft forever
16: primary0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1532 qdisc noqueue master bat0 state UNKNOWN qlen 1000
link/ether 5e:d3:4e:c2:f0:03 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5cd3:4eff:fec2:f003/64 scope link
valid_lft forever preferred_lft forever
17: mesh-vpn: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1394 qdisc fq_codel master bat0 state UNKNOWN qlen 1000
link/ether 5e:d3:4e:c2:f0:07 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5cd3:4eff:fec2:f007/64 scope link
valid_lft forever preferred_lft forever
23: br-mesh_other: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 5e:d3:4e:c2:f0:04 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5cd3:4eff:fec2:f004/64 scope link
valid_lft forever preferred_lft forever
24: vx_mesh_other: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue master bat0 state UNKNOWN qlen 1000
link/ether 5e:d3:4e:c2:f0:04 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5cd3:4eff:fec2:f004/64 scope link
valid_lft forever preferred_lft forever
Calling ethtool can be done on each of the eth
devices and could be done on dsa
as well, leading to different results.
root@erx-bridge-port-isolation-debugger:~# ethtool eth2
Settings for eth2:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 2
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Link detected: yes
Ideally we'd need someone to confirm by going through the offloading capabilities (
ethtool -k
) and trying to disable them (ethtool -K
) to wait and see if one of them affects the bridge port forwarding.
[...]
Originally posted by @mweinelt in https://github.com/freifunk-gluon/gluon/issues/2600#issuecomment-1245229432
root@erx-bridge-port-isolation-debugger:~# ethtool -k eth2
Features for eth2:
rx-checksumming: on [fixed]
tx-checksumming: on
tx-checksum-ipv4: on [fixed]
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on [fixed]
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: on [fixed]
tx-tcp6-segmentation: on [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
So the only features one can disable which are not already of are these:
generic-segmentation-offload: on
generic-receive-offload: on
hw-tc-offload: on
I disabled them for the two lan interfaces that currently mesh with the other two routers.
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth2 generic-segmentation-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth2 generic-receive-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth2 hw-tc-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth3 generic-segmentation-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth3 generic-receive-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth3 hw-tc-offload off
root@erx-bridge-port-isolation-debugger:~# /etc/init.d/network restart
Not sure about the error message, but restarting the network worked.
Command failed: Not found
Both connected devices still see each other perfectly fine.
Note: Bridge Port Isolation does not work on D-Link DGS-1210-10P either.
While this just simply does not work in Hanover, it randomly breaks mesh connections in Darmstadt. We concluded, either we get this resolved before an upcoming release or this (as well as it's backport will be reverted, as much as that'd suck).
@NeoRaider intends to experiment on the isolation feature on a FB4040 in the next days/weeks.
The current thinking is that the DSA stack does not support bridge port isolation and fails to signal back its incapacity.
The bridge core supports passing these flags to DSA since Linux 5.19, however very few DSA drivers implement port isolation so far. I've added list of relevant drivers to the issue description.
Hm.. there is still some work to do. I added isolation support for KSZ switches: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/drivers/net/dsa/microchip?id=a7f08029e2e84ecafbfff50fcff976fafee72799
@AiyionPrime Missing bridge port isolation should not break any mesh connections, unless you build a ring or similar topology of multiple nodes and STP does not work to prevent a forwarding loop. I would be good to know if that is the case, or something else is going wrong in your deployment.
I'm from Hanover, the finding was from FF Darmstadt, I think. I only reported it here, in order not to lose the intel.
The current working theory is that this may be caused by offloading of the bridge port forwarding.
We would need someone to test this theory by disabling the relevant offloading features.
See https://github.com/freifunk-gluon/gluon/pull/2600#issuecomment-1245229432
List of DSA drivers to test/implement: