Chadster766 / McDebian

Linksys WRT3200ACM, WRT1900AC, WRT1900ACS, WRT1200AC and WRT32X Router Debian Implementation
98 stars 14 forks source link

Multiple APs with mcDebian #70

Open sfrost opened 3 years ago

sfrost commented 3 years ago

Greetings,

Having had very little luck figuring it out, I figure I'll ask here- have you any experience with getting multiple APs to work with McDebian all providing the same SSID and allowing devices to roam between APs?

I've done relatively minimal changes to the mcDebian hostapd.conf files (change the SSID and the passphrase, mainly) and have discovered that when Windows laptops and iPhones switch APs, there's 5 minutes of 'dead' time before they're able to actually get to the internet. I thought this might be due to bridging VLANs in the Linux kernel and so I ended up moving entirely over to just regular switches for the VLAN work and have kept the actual APs relatively simple, but still there's this delay when switching access points.

Next, I tried to get FT-PSK to work with hostapd and while I was able to run hostapd with FT-PSK, it didn't seem to help- I'm not sure if that's because I didn't configure for 802.11r properly (I tried to use the 'ft psk' mode rather than configuring all the r0hk stuff, but maybe that was wrong?).

Basically, just hoping that maybe you, or one of the other folks who follows this project, has already dealt with this and figured out the right hostapd.conf incantation to make this work cleanly and smoothly. Having to wait 5 minutes when moving between floors is driving me crazy. :)

Thanks!

sfrost commented 3 years ago

Oh, this is all with the newer 5.6.14 kernel, and I've upgraded nftables and the firewall is nft, but that's only one of the APs (I've got 3... :), and even if I turn off the radio on the actual firewall and just try to roam between the other two, which don't have any nftable configuration or any other firewalling, it still takes 5 minutes to move between them...

Chadster766 commented 3 years ago

Once in a while I have seen this but it hasn't been much of an issue.

I mostly see this if the APs see each other at greater than -65dbm which means they are too far apart so the client will not be able to change AP once it reaches -70 from the AP it's currently connected to.

sfrost commented 3 years ago

Either of these APs would do an alright job serving the entire house, so I don't think it's an issue like that. It's very consistent and just about always 5 minutes of not being able to get to anything when switching APs, feels very much like a timeout or something. Once the device is back online, as long as it isn't moved too far, it works great with strong signal strength and everything.

Chadster766 commented 3 years ago

I recommend you run a scan on each AP to see what the other AP signal strengths are like between them.

sfrost commented 3 years ago

Ok, I've run scans on each of the APs, on both interfaces, and included the results here.

ap1 is at 00:25:9c:13:a2:35 / 00:25:9c:13:a2:36 ap2 is at 30:23:03:dd:18:a9 / 30:23:03:dd:18:aa

on 2ghz, ap1 -> ap2 has signal: -52.00 dBm on 5ghz, ap1 -> ap2 has signal: -71.00 dBm

on 2ghz, ap2 -> ap1 has signal: -59.00 dBm on 5ghz, ap2 -> ap1 has signal: -80.00 dBm

I'm a bit surprised at these results since the two are only 1 floor apart (though on opposite sides of the townhouse).

ap1_2ghz_scan.txt ap1_5ghz_scan.txt ap2_2ghz_scan.txt ap2_5ghz_scan.txt

Also included is a client scan:

This shows that signal strength to ap1 on 2ghz is -40, on 5ghz -46 Strength to ap2 on 2ghz is -64, on 5ghz -74

This client is in the basement, so one AP is a floor closer (and is on the same side as the client), so it's not surprising that this is the case.

Generally speaking, once a device (eg: my iphone) is associated with an AP, everything is fine and works great- the issue is that moving between floors with the device ends up having it switch APs (which should be fine...), but then it takes about 5 minutes before it's able to actually connect to anything and that's the part that I just can't figure out.

client1_scan.txt

It does look like the client sees FT-PSK enabled, which seems like it'd be a good thing, and I have that configured as:

ap1:

wpa_key_mgmt=WPA-PSK FT-PSK nas_identifier=00-25-9c-13-a2-36:ap.snowman.net mobility_domain=a5b1 ft_psk_generate_local=1

ap2:

wpa_key_mgmt=WPA-PSK FT-PSK nas_identifier=30-23-03-dd-18-aa:ap.snowman.net mobility_domain=a5b1 ft_psk_generate_local=1

but it doesn't seem to have helped at all- I get the same effect with the original config in mcdebian which has:

wpa_key_mgmt=WPA-PSK

and then doesn't specify anything for nas_identifier/mobility_domain/ft_psk_generate_local

Any thoughts or suggestions you have on this would be great to hear, happy to try out different things too. Have to say that it's driving me a bit crazy! I would have thought this would be straight-forward to get going...

ericwoud commented 3 years ago

I ran in to the same problems as you describe in the initial post. If you want roaming, but do not want to setup a nat on every access point, the bridge's forwarding database is still sending packets to the old port when you have roamed to a new port. After some time the database entries are cleaned up and you finally have your internet connection. The same happens in any ethernet switch between router and access points.

I programmed 2 solutions for this problem and added them to my repositories

sfrost commented 3 years ago

Where are the solutions..? Could you provide a link?

ericwoud commented 3 years ago

The one I use is:

bridgefdbd

As I also run my router on Ubuntu, not only my AP.

sfrost commented 3 years ago

Ok, this is starting to look promising and pretty curious.

When looking at the fdb, what I'm seeing is that I'm getting two entries for a client's MAC, eg:

d0:3f:aa:e8:XX:XX dev lan1 self d0:3f:aa:e8:XX:XX dev wlp3s0 master br0

Removing the one associated with 'lan1' by hand does indeed make things start working, but I'm trying to figure out how this is ending up happening in the first place..? Surely when a given device is connected to the wireless it shouldn't be getting picked up as also being on the lan, but that's what is happening and somehow that's confusing the AP and traffic destined for that MAC isn't getting sent over the wifi. I've further noticed that it seems to only be happening when the two interfaces are bridged that the packets aren't being forwarded, when there's actual routing happening, everything seems to work fine.

I'm still not sure if there's something configured incorrectly or if there's just a bug, but I ended up writing and running this script on all my APs and it seems to have "fixed" the issue for me:

`#!/bin/bash

PRIOR_MAC="" PRIOR_DEV="" PRIOR_DEVBASE=""

while [ 1 ]; do bridge fdb | sort | \ while read line; do CURRENT_MAC=echo $line | cut -f1 -d' ' CURRENT_DEV=echo $line | cut -f3 -d' ' CURRENT_DEVBASE=echo $line | cut -f3 -d' ' | cut -c1-3 if [ "$CURRENT_MAC" = "$PRIOR_MAC" -a "$PRIOR_DEVBASE" = "lan" -a "$CURRENT_DEVBASE" = "wlp" ]; then echo removing extra fdb entry for $CURRENT_MAC $PRIOR_DEV bridge fdb del $CURRENT_MAC dev $PRIOR_DEV fi PRIOR_MAC=$CURRENT_MAC PRIOR_DEV=$CURRENT_DEV PRIOR_DEVBASE=$CURRENT_DEVBASE done sleep 1 done `

Thanks for the pointer to go look at the bridge fdb! If anyone learns more about what's going on here, I'd love to hear it, since this is quite frustrating!

ericwoud commented 3 years ago

Constantly deleting macs from the switches fdb, turns the ethernet switch into an ethernet hub... This is why i wrote bridgefdbd which only deletes when it is necessairy.

Chadster766 commented 3 years ago

Sorry I don't have a lot of time to help with this but my (older) multi AP deployment of McDebian with Enterprise wifi looks like this:

root@MCDEBIAN:~# cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

auto lo
iface lo inet loopback

#auto eth0 -- Do not uncomment it will cause major startup delays
iface eth0 inet manual

auto eth1
iface eth1 inet manual

auto lan1
iface lan1 inet manual
        #locally administered mac address
        hwaddress ether 02:3b:45:92:19:06

auto lan2
iface lan2 inet manual
        hwaddress ether 02:4a:dc:6f:cd:7d

auto lan3
iface lan3 inet manual
        hwaddress ether 02:3d:2b:d9:b5:d5

auto lan4
iface lan4 inet manual
        hwaddress ether 02:e0:96:70:5f:c3

auto wlp1s0
iface wlp1s0 inet manual
        #locally administered mac address that ends with zero for multi SSID mac address generation
        pre-up ip link set wlp1s0 address 02:7a:03:6d:bb:40

auto wlp2s0
iface wlp2s0 inet manual
        pre-up ip link set wlp2s0 address 02:b1:77:35:c7:e0

auto wan
iface wan inet manual
        pre-up /etc/network/mcdebian-set-wan-mac-address
        #pre-up iptables-restore < /etc/iptables.up.rules

#iface wan inet6 auto
#       pre-up ip6tables-restore < /etc/ip6tables.up.rules
#       up sleep 5
        #The below line shouldn't be required with McDebian IPv6 NAT and the McDebian dhcpd6 server setup
        #up dhclient -1 -6 -cf /etc/dhcp/dhclient6.conf -lf /var/lib/dhcp/dhclient6.wan.leases -v wan || true

auto br0
iface br0 inet static
        bridge_hw 02:2d:50:bd:ca:13
        bridge_ports wan
        address 192.168.60.2
        netmask 255.255.255.0
        network 192.168.60.0
        gateway 192.168.60.1
        broadcast 192.168.60.255
        post-up /etc/network/mcdebian-wrt1900acV1-wrt3200-wlan
        post-up ip route add default dev br0

auto wan.4 wan.20 br1 br2

iface wan.4 inet manual
        vlan-raw-device wan

iface wan.20 inet manual
        vlan-raw-device wan

iface br1 inet manual
        bridge_ports wan.4

iface br2 inet manual
        bridge_ports wan.20

#iface br0 inet6 static
#       address fc00::1
#       netmask 64
root@MCDEBIAN:~# cat /etc/hostapd/wlp1s0.conf
interface=wlp1s0
bridge=br0
driver=nl80211
ctrl_interface=/var/run/hostapd
ignore_broadcast_ssid=0
hw_mode=a
channel=36
wmm_enabled=1
ieee80211n=1
ht_capab=[LDPC][HT40+][SHORT-GI-20][SHORT-GI-40]
vht_capab=[RXLDPC][SHORT-GI-80][RX-STBC-1][SU-BEAMFORMER][SU-BEAMFORMEE][MAX-A-MPDU-LEN-EXP7][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN]
ieee80211ac=1
vht_oper_chwidth=1
vht_oper_centr_freq_seg0_idx=42
country_code=US

#Corp Wireless
ssid=**********
wpa_group_rekey=0
nas_identifier=corp-wifi
auth_server_addr=192.168.22.21
auth_server_port=1812
auth_server_shared_secret=**********
acct_server_addr=192.168.22.21
acct_server_port=1813
acct_server_shared_secret=**********
auth_server_addr=192.168.21.18
auth_server_port=1812
auth_server_shared_secret=**********
acct_server_addr=192.168.21.18
acct_server_port=1813
acct_server_shared_secret=**********
ieee8021x=1
auth_algs=3
wpa=3
wpa_pairwise=CCMP
wpa_key_mgmt=WPA-EAP

#Guest Wireless
bss=wlp1s0_4
bridge=br1
ssid=**********

auth_algs=1
wpa=2
wpa_passphrase=**********
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP
root@MCDEBIAN:~# cat /etc/hostapd/wlp2s0.conf
interface=wlp2s0
bridge=br0
driver=nl80211
ctrl_interface=/var/run/hostapd
ignore_broadcast_ssid=0
hw_mode=g
channel=6
wmm_enabled=1
ieee80211n=1
ht_capab=[LDPC][SHORT-GI-20][SHORT-GI-40]
ieee80211ac=0
country_code=US

#Corp Wireless
ssid=**********
wpa_group_rekey=0
nas_identifier=corp-wifi
auth_server_addr=192.168.22.21
auth_server_port=1812
auth_server_shared_secret=**********
acct_server_addr=192.168.22.21
acct_server_port=1813
acct_server_shared_secret=**********
auth_server_addr=192.168.21.18
auth_server_port=1812
auth_server_shared_secret=**********
acct_server_addr=192.168.21.18
acct_server_port=1813
acct_server_shared_secret=**********
ieee8021x=1
auth_algs=3
wpa=3
wpa_pairwise=CCMP
wpa_key_mgmt=WPA-EAP

#Guest Wireless
bss=wlp2s0_4
bridge=br1
ssid=**********
auth_algs=1
wpa=2
wpa_passphrase=**********
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP

#Voice Wireless
bss=wlp2s0_20
bridge=br2
ssid=**********
wpa_group_rekey=0
nas_identifier=voip-wifi
auth_server_addr=192.168.22.21
auth_server_port=1812
auth_server_shared_secret=**********
acct_server_addr=192.168.22.21
acct_server_port=1813
acct_server_shared_secret=**********
auth_server_addr=192.168.21.18
auth_server_port=1812
auth_server_shared_secret=**********
acct_server_addr=192.168.21.18
acct_server_port=1813
acct_server_shared_secret=**********
ieee8021x=1
auth_algs=3
wpa=3
wpa_pairwise=CCMP
wpa_key_mgmt=WPA-EAP

I make sure the channels of both the 2.4Ghz (1, 6 and 11) and 5Ghz (36 and 149) are staggered.

Also all MAC addresses are unique and generated via https://www.hellion.org.uk/cgi-bin/randmac.pl?scope=local&type=unicast and the wireless ones I change to have a 0 on the end due to the way the wireless driver created MAC address for multiple bssids.

In this example the WRT WAN port is connected to the main network switches which trunks the different network VLANs to the AP.

I hope this helps in some way.

sfrost commented 3 years ago

Constantly deleting macs from the switches fdb, turns the ethernet switch into an ethernet hub... This is why i wrote bridgefdbd which only deletes when it is necessairy.

The little shell script I wrote also is only delete'ing the MAC when it needs to- that is, when the MAC is listed both on a lan interface and on a wifi interface. I really don't understand what's going on in Linux (or maybe it's the switch..?) that's resulting in having the MAC show up associated with both interfaces, or why having it associated with both interfaces is causing traffic to not get passed through. The more I think about it, the more it seems like this must be a bug somewhere. Maybe because they're both connected to a bridge, Linux is propagating the MAC from the wifi interface to the lan interface? I don't know why that would make sense though. As near as I can tell, having the MAC on both interfaces is happening even if the wifi device has only associated to one AP and hasn't been on any of the others recently, meaning that there's absolutely no reason for its MAC to show up on the lan interface of the AP it's connected to, but that's what is happening and making it impossible for the wifi device to communicate with anything because the traffic isn't getting passed through from the wifi interface to the lan interface to be sent out. Very odd.

As a side note, I doubt having actual switches involved or in-between would actually be an issue- in my experience, physical switches very quickly pick up on MAC address changes without issue. I'm pretty surprised that Linux is having such an issue.

sfrost commented 3 years ago

Thanks so much @Chadster766 for that config! I'm not sure that it'll help with what's going on here, but I'll definitely review what you shared and see if there's improvements to my setup that I should make. Thanks again!

ericwoud commented 3 years ago

What you are trying to do is not a bug but simply not supported on ethernet networks. On Windows it is even prohibited:

Quote: Layer 2 bridging is prohibited between the AP adapter and any other adapters in the system.

This is why you need to use ICS/SoftAP on Windows to have the AP creatie a local network for the wireless connection. On linux you use iptables/nftables masquerading in combination with a dhcp server. Any AP you buy does the same thing. Reportedly a few use Mac spoofing, Just as my other solution mcspoof

On linux it is not prohibited, but if you bridge wifi to lan on your AP you will run into the problems you have described. You need to apply some fix.

You can connect your mcdebian AP's with the WAN port to your router, as @Chadster766 does. This basically does setup the local network for the AP. Sadly, this also sets up the LAN ports of the AP inside the same local network. I believe @Chadster766 uses vlan to tackle this.

As I do not like to setup a local network for wifi on every AP, I created other solutions.

sfrost commented 3 years ago

Where is that quote from...? What is the reasoning for that?

I mean, mcdebian is specifically set up to do exactly what I'm doing- creating one bridge between the wireless interfaces and the ethernet interfaces, and it sure looks like that's how most APs work when viewing them from the outside.

All I'm trying to get to is a point where I've got one L2 network while using more than one mcdebian access point. This is really not a complicated setup...

ericwoud commented 3 years ago

Where is that quote from...? What is the reasoning for that?

The word Quote in the previous post is a link to the webpage on the Microsoft site. It is the documentation of MS Wireless Hosted Network, the Microsoft version of hostapd.

I mean, mcdebian is specifically set up to do exactly what I'm doing- creating one bridge between the wireless interfaces and the ethernet interfaces, and it sure looks like that's how most APs work when viewing them from the outside.

Even the developer of mcdebian uses the WAN port.

All I'm trying to get to is a point where I've got one L2 network while using more than one mcdebian access point. This is really not a complicated setup...

Not complicated at all... But on the inside....

I seems you will be looking for other solutions. Please let me know if you find any, I would be very interested.

sfrost commented 3 years ago

Where is that quote from...? What is the reasoning for that?

The word Quote in the previous post is a link to the webpage on the Microsoft site. It is the documentation of MS Wireless Hosted Network, the Microsoft version of hostapd.

I don't see why what is on Microsoft's site would be relevant, we aren't running Windows on these things. :)

I mean, mcdebian is specifically set up to do exactly what I'm doing- creating one bridge between the wireless interfaces and the ethernet interfaces, and it sure looks like that's how most APs work when viewing them from the outside.

Even the developer of mcdebian uses the WAN port.

Sure- for upstream, but the non-WAN ports are all part of the same bridge that the wifi adapters are connected to, just like in my setup.

All I'm trying to get to is a point where I've got one L2 network while using more than one mcdebian access point. This is really not a complicated setup...

Not complicated at all... But on the inside....

I seems you will be looking for other solutions. Please let me know if you find any, I would be very interested.

I mean, what I've got now with the shell script I posted before more-or-less works, but it definitely isn't ideal.

ericwoud commented 3 years ago

It is a DSA switch issue. It is mentioned in here and here. It affects many devices using the dsa architecture. Still do not find any real fix for this issue.

Chadster766 commented 3 years ago

@ericwoud I looked into those posts but the files they reference don't seem to exist in new kernel versions.

ericwoud commented 3 years ago

It looks like they are working on it:

thread

This is a V2, which I tried on kernel 5.10, without success. There is a V3, but I'm not so sure this patch will affect our system. One file patch is on drivers/net/dsa/ocelot/felix.c which is not a marvell dsa driver. It is however tackling the same problem we are having here...

Chadster766 commented 3 years ago

I don't think the WRT Marvel DSA switches are effected by this issue.

Keep in mind that the MAC Address of the WRT wireless wlp1s0 and wlp2s0 should end in 0 because of the way the driver creates BSSIDs. This is especially important if you have multiple SSID per radio.

ericwoud commented 3 years ago

sfrost wrote:

d0:3f:aa:e8:XX:XX dev lan1 self d0:3f:aa:e8:XX:XX dev wlp3s0 master br0

Removing the one associated with 'lan1' by hand > does indeed make things start working

The self entry on lan1 is the entry in the hardware. This is the one that is stuck in the fdb, actually the dsa's fdb. Removing this entry, manually or any other means, makes things working again.

sfrost commented 3 years ago

Looks like there's a v4 of that patch set that you mentioned now, would be interesting if someone could play with it and see if it fixes the issue.

I did go back and figure out that setting the MAC addresses on the wireless interfaces wasn't working properly in a couple of cases (to make sure the last digit is a 0). That didn't seem to be causing any particular issues, but I fixed it anyway and now they definitely all have a '0' at the end- but it didn't fix the roaming issue where the MAC of the device was ending up in multiple places and not getting cleaned up and therefore not working to have traffic get passed through. Ultimately, whatever issue that fixed, it didn't seem to be one that I'm running into.

The post here, as mentioned above- https://gitlab.nic.cz/turris/turris-build/-/issues/165 seems to really be spot-on since it mentions the 300 second timeout which I was also definitely experiencing.

ericwoud commented 3 years ago

Hi Stephen,

See this commit

https://github.com/openwrt/openwrt/commit/f1158fbcf63c190fb4f7686075ab99f2aee98a92

This could be the answer. I have not tried it as I am migrating to bananapi R64.