freifunk-gluon / gluon

a modular framework for creating OpenWrt-based firmwares for wireless mesh nodes
https://gluon.readthedocs.io
Other
538 stars 325 forks source link

Bridge Port Isolation not working on DSA #2679

Open mweinelt opened 1 year ago

mweinelt commented 1 year ago

The current working theory is that this may be caused by offloading of the bridge port forwarding.

We would need someone to test this theory by disabling the relevant offloading features.

See https://github.com/freifunk-gluon/gluon/pull/2600#issuecomment-1245229432

List of DSA drivers to test/implement:

AiyionPrime commented 1 year ago

I flashed our current nightly based on gluons master; the problem persists. This meshlink should not be there.

On the ERX I performed the following actions:

root@erx-bridge-port-isolation-debugger:~#  opkg update; opkg install ethtool
root@erx-bridge-port-isolation-debugger:~#  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1504 qdisc fq_codel state UP qlen 1000
    link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link 
       valid_lft forever preferred_lft forever
3: eth0@dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-wan state UP qlen 1000
    link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
4: eth1@dsa: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-mesh_other state LOWERLAYERDOWN qlen 1000
    link/ether b4:fb:e4:53:ae:e9 brd ff:ff:ff:ff:ff:ff
5: eth2@dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mesh_other state UP qlen 1000
    link/ether b4:fb:e4:53:ae:ea brd ff:ff:ff:ff:ff:ff
6: eth3@dsa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mesh_other state UP qlen 1000
    link/ether b4:fb:e4:53:ae:eb brd ff:ff:ff:ff:ff:ff
7: eth4@dsa: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-mesh_other state LOWERLAYERDOWN qlen 1000
    link/ether b4:fb:e4:53:ae:ec brd ff:ff:ff:ff:ff:ff
8: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether fa:1d:95:4a:38:53 brd ff:ff:ff:ff:ff:ff
9: teql0: <NOARP> mtu 1500 qdisc noop state DOWN qlen 100
    link/void 
11: br-wan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.178.63/24 brd 192.168.178.255 scope global br-wan
       valid_lft forever preferred_lft forever
    inet6 2a02:560:529b:8c00:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 6915sec preferred_lft 3315sec
    inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link 
       valid_lft forever preferred_lft forever
12: local-port@local-node: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client state UP qlen 1000
    link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
13: local-node@local-port: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 16:41:95:40:f8:dc brd ff:ff:ff:ff:ff:ff
    inet 10.14.0.1/16 brd 10.14.255.255 scope global local-node
       valid_lft forever preferred_lft forever
    inet6 fdca:ffee:8:14::1/128 scope global deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 fe80::1441:95ff:fe40:f8dc/64 scope link 
       valid_lft forever preferred_lft forever
14: br-client: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
    inet6 2a02:790:ff:114:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 276sec preferred_lft 126sec
    inet6 2a02:790:ff:414:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 291sec preferred_lft 141sec
    inet6 2a02:790:ff:714:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 273sec preferred_lft 123sec
    inet6 2a02:790:ff:914:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 270sec preferred_lft 120sec
    inet6 2a02:790:ff:1014:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 239sec preferred_lft 89sec
    inet6 2a02:790:ff:514:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 277sec preferred_lft 127sec
    inet6 2001:678:978:214:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 291sec preferred_lft 141sec
    inet6 fdca:ffee:8:14:b6fb:e4ff:fe53:aee8/64 scope global dynamic noprefixroute 
       valid_lft 7170sec preferred_lft 120sec
    inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link 
       valid_lft forever preferred_lft forever
15: bat0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client state UNKNOWN qlen 1000
    link/ether b4:fb:e4:53:ae:e8 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b6fb:e4ff:fe53:aee8/64 scope link 
       valid_lft forever preferred_lft forever
16: primary0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1532 qdisc noqueue master bat0 state UNKNOWN qlen 1000
    link/ether 5e:d3:4e:c2:f0:03 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5cd3:4eff:fec2:f003/64 scope link 
       valid_lft forever preferred_lft forever
17: mesh-vpn: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1394 qdisc fq_codel master bat0 state UNKNOWN qlen 1000
    link/ether 5e:d3:4e:c2:f0:07 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5cd3:4eff:fec2:f007/64 scope link 
       valid_lft forever preferred_lft forever
23: br-mesh_other: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 5e:d3:4e:c2:f0:04 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5cd3:4eff:fec2:f004/64 scope link 
       valid_lft forever preferred_lft forever
24: vx_mesh_other: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue master bat0 state UNKNOWN qlen 1000
    link/ether 5e:d3:4e:c2:f0:04 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5cd3:4eff:fec2:f004/64 scope link 
       valid_lft forever preferred_lft forever

Calling ethtool can be done on each of the eth devices and could be done on dsa as well, leading to different results.

root@erx-bridge-port-isolation-debugger:~# ethtool eth2
Settings for eth2:
    Supported ports: [ TP MII ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: Symmetric Receive-only
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: Symmetric Receive-only
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                         100baseT/Half 100baseT/Full 
                                         1000baseT/Full 
    Link partner advertised pause frame use: Symmetric Receive-only
    Link partner advertised auto-negotiation: Yes
    Link partner advertised FEC modes: Not reported
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 2
    Transceiver: external
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: d
    Wake-on: d
    Link detected: yes

Ideally we'd need someone to confirm by going through the offloading capabilities (ethtool -k) and trying to disable them (ethtool -K) to wait and see if one of them affects the bridge port forwarding.

[...]

Originally posted by @mweinelt in https://github.com/freifunk-gluon/gluon/issues/2600#issuecomment-1245229432

root@erx-bridge-port-isolation-debugger:~# ethtool -k eth2
Features for eth2:
rx-checksumming: on [fixed]
tx-checksumming: on
    tx-checksum-ipv4: on [fixed]
    tx-checksum-ip-generic: off [fixed]
    tx-checksum-ipv6: on [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on [fixed]
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on [fixed]
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: on [fixed]
    tx-tcp6-segmentation: on [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]

So the only features one can disable which are not already of are these:

generic-segmentation-offload: on
generic-receive-offload: on
hw-tc-offload: on

I disabled them for the two lan interfaces that currently mesh with the other two routers.

root@erx-bridge-port-isolation-debugger:~# ethtool -K eth2 generic-segmentation-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth2 generic-receive-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth2 hw-tc-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth3 generic-segmentation-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth3 generic-receive-offload off
root@erx-bridge-port-isolation-debugger:~# ethtool -K eth3 hw-tc-offload off
root@erx-bridge-port-isolation-debugger:~# /etc/init.d/network restart

Not sure about the error message, but restarting the network worked.

Command failed: Not found

Both connected devices still see each other perfectly fine.

AiyionPrime commented 1 year ago

Note: Bridge Port Isolation does not work on D-Link DGS-1210-10P either.

AiyionPrime commented 1 year ago

While this just simply does not work in Hanover, it randomly breaks mesh connections in Darmstadt. We concluded, either we get this resolved before an upcoming release or this (as well as it's backport will be reverted, as much as that'd suck).

@NeoRaider intends to experiment on the isolation feature on a FB4040 in the next days/weeks.

mweinelt commented 1 year ago

The current thinking is that the DSA stack does not support bridge port isolation and fails to signal back its incapacity.

neocturne commented 1 month ago

The bridge core supports passing these flags to DSA since Linux 5.19, however very few DSA drivers implement port isolation so far. I've added list of relevant drivers to the issue description.

olerem commented 1 month ago

Hm.. there is still some work to do. I added isolation support for KSZ switches: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/drivers/net/dsa/microchip?id=a7f08029e2e84ecafbfff50fcff976fafee72799

neocturne commented 1 month ago

@AiyionPrime Missing bridge port isolation should not break any mesh connections, unless you build a ring or similar topology of multiple nodes and STP does not work to prevent a forwarding loop. I would be good to know if that is the case, or something else is going wrong in your deployment.

AiyionPrime commented 1 month ago

I'm from Hanover, the finding was from FF Darmstadt, I think. I only reported it here, in order not to lose the intel.