freifunk-gluon / gluon

a modular framework for creating OpenWrt-based firmwares for wireless mesh nodes
https://gluon.readthedocs.io
Other
550 stars 325 forks source link

IBSS mesh and client mode doesn't work in parallel with ath10k-ct #1584

Closed oliver closed 5 years ago

oliver commented 5 years ago

After the general 5GHz wifi problems with AC7 and 2018.1.1+bremen1 firmware were solved (in #1561), we found out that meshing on 5GHz still does not work.

This affects all FFHB 2018.1 firmwares so far, and has (only) been tested on Archer C7 v2 so far.

Symptoms are that a 5GHz-only device (eg. CPE510) does not create a working wifi mesh connection to the AC7.

Example device: http://og-ac7-testing-2.nodes.ffhb.de (https://map.ffhb.de/#!/en/map/f4f26d70b277). The AC7 is in the same room as a CPE510 (https://map.ffhb.de/#!/en/map/c4e984b0a84a) and a WDR3600 (https://map.ffhb.de/#!/en/map/c4e984d5138e), both with 2017.1.8+bremen1 (ie. stable) firmware. The Stable-FW devices are meshing nicely. The AC7 is meshing only via 2.4GHz. As result, the WDR3600 meshes with both other devices while the CPE510 meshes only with WDR3600.

Interestingly the devices appear to see each other via IBSS, but still don't get a mesh connection working:

root@cpe510-og-1:~# iw dev ibss0 station dump
Station aa:85:d3:37:f0:5e (on ibss0)
    inactive time:  0 ms
    rx bytes:   167231521
    rx packets: 1267310
    tx bytes:   15690420
    tx packets: 65650
    tx retries: 2531
    tx failed:  0
    rx drop misc:   0
    signal:     -58 [-60, -62] dBm
    signal avg: -59 [-61, -63] dBm
    tx bitrate: 144.4 MBit/s MCS 15 short GI
    rx bitrate: 144.4 MBit/s MCS 15 short GI
    expected throughput:    46.875Mbps
    authorized: yes
    authenticated:  yes
    associated: yes
    preamble:   long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:  no
    DTIM period:    0
    beacon interval:100
    short slot time:yes
    connected time: 5629 seconds
Station 0e:9e:42:44:91:aa (on ibss0)
    inactive time:  30 ms
    rx bytes:   73632280
    rx packets: 567327
    tx bytes:   0
    tx packets: 0
    tx retries: 0
    tx failed:  0
    rx drop misc:   0
    signal:     -61 [-63, -65] dBm
    signal avg: -60 [-63, -63] dBm
    tx bitrate: 6.0 MBit/s
    authorized: yes
    authenticated:  yes
    associated: yes
    preamble:   long
    WMM/WME:    yes
    MFP:        no
    TDLS peer:  no
    DTIM period:    0
    beacon interval:100
    short slot time:yes
    connected time: 2544 seconds

Nothing strange in batctl if:

root@og-ac7-testing-2:~# batctl if
mesh-vpn: active
ibss1: active
ibss0: active
primary0: active

I'm AFK now for a while; will post dmesg etc. later, but there's not much to see there. What info would be useful to debug this?

mweinelt commented 5 years ago

Please post the output of batctl n

oliver commented 5 years ago
root@og-ac7-testing-2:~# batctl n
Error - no valid command or debug table specified: n
[...]
root@og-ac7-testing-2:~# batctl -v
batctl 2013.4.0 [batman-adv: 2013.4.0]
oliver commented 5 years ago

Various command results of an AC7 v2 with 2018.1.1+bremen2:

Command results of another AC7 v2 with (edit) 2017.1.8+bremen1:

oliver commented 5 years ago
root@og-ac7-testing-2:~# modinfo batman-adv
module:     /lib/modules/4.4.153/batman-adv.ko
version:    2013.4.0
description:    B.A.T.M.A.N. advanced
author:     Marek Lindner <lindner_marek@yahoo.de>, Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
license:    GPL
depends:    
root@og-ac7-testing-2:~# lsmod | grep batman
batman_adv            106595  0 
root@og-ac7-testing-2:~# lsmod | wc -l
18

On 2018.1.1+bremen2 there are 18 modules loaded; on 2017.1.8+bremen1 there are 117 modules loaded! This is probably related to #1580, but I don't know whether it actually causes these problems.

Anyway, here's lsmod output for old and new firmware:

mweinelt commented 5 years ago

On 2018.1.1+bremen2 there are 18 modules loaded; on 2017.1.8+bremen1 there are 117 modules loaded! This is probably related to #1580, but I don't know whether it actually causes these problems.

Sounds like it. Please retry with the latest v2018.1.x commit and report back if that fixes the issue.

neocturne commented 5 years ago

The low number of loaded module is expected, we started building as much as possible into the kernel.

neocturne commented 5 years ago

To check whether the issue is in ath10k or in batadv, try if a simple ping (over IPv6 link-local) on ibss0 works.

Unfortunately, the logs don't show anything unusual. Does the issue only occur between ath9k and ath10k devices, or is it reproducible with two ath10k devices as well?

neocturne commented 5 years ago

One more thing to test: Since v2018.1, we set the htmode of 11ac devices to 'VHT20' in /etc/config/wireless, in older versions, it was always set to 'HT20'.

blocktrron commented 5 years ago

rx_bitrate is missing on iw dev ibss0 station dump, this may be a hint.

oliver commented 5 years ago

(sorry for the silence on my side, I'm not online a lot at the moment and will probably get to this issue in 2019. Thanks for the comments, I will try these hints)

rotanid commented 5 years ago

@oliver we will close this ticket now. feel free to reopen as soon as you can provide additional information like requested

oszilloskop commented 5 years ago

Same behavior with Gluon 2018.1.3 in combination with ibss, an Ubiquiti Loco M5 XW and an Ubiquiti UniFi-AC-MESH are here.

Hint: rx_bitrate are missing at both devices after iw dev ibss0 station dump. tx_bitrate are at both just 6.0 MBit/s (same as oliver's).

Our sites are here: https://github.com/freifunk-ffm/site-ffffm/tree/test

blocktrron commented 5 years ago

@oszilloskop Can you link your binary firmware files?

https://github.com/freifunk-gluon/gluon/commit/427c83754b51ace0c9993d3d1a62b51ba4f2217c Might this patch be the reason? (Neoraider already pointed this patch out) Curently only issues were reported for 802.11n <--> 802.11ac mesh links.

I can't reproduce this here with 11s and non-ct firmware (802.11n 20MHz <--> 802.11ac 80MHz works w/o a single problem).

oszilloskop commented 5 years ago

@blocktrron You will find the binary files here: https://dl.ffm.freifunk.net/firmware/test/

EDIT: Our Firmware v2.4.10-test-1127 is Gluon v2018.1.3 Our Firmware v2.4.4-test-0430 is Gluon v2017.1.7

rotanid commented 5 years ago

@blocktrron he pointed out on IRC, that it also doesn't work with v2017.1.7 - this does not contain the mentioned patch. interestingly though, v2017.1.8 does work for Freifunk Bremen if i understood correctly what @oliver wrote. the low-hanging fruit, err, conclusion: one of both tests went wrong or it's a different issue. i think(!) v2017.1.x does not have a general issue with 5 GHZ ibss mesh as this would have hopefully been reported last year - and it works for FF Bremen as far as i understand.

oliver commented 5 years ago

Short update: ping via ibss0 (5 GHZ IBSS) shows some weirdness on 2018.1.3: ping6 ff02::1%ibss0 returns only the local IPv6 address but no replies from other nodes. But doing the same via ibss1 shows replies from the other node that is connected via 2.4 GHz.

But if I directly ping the link-local IPv6 of another node via ibss0 (ie. not via broadcast address), I do get replies. And afterwards I will also get a reply from that node with ping ff02::1%ibss0; and iw dev ibss0 station dump now actually shows an "rx bitrate:" for the connection to this node. And on the Gluon status page the name of the remote node will now also appear (before, there was only the MAC shown).

I don't know enough about wifi IPv6 broadcasting to understand what's the cause or effect here. So maybe the broadcast ping is indeed broken and this causes the Batman problem; or maybe Batman (or something else) is broken, and that also causes the IBSS connection to stay in some "idle" state where broadcast pings don't work.

But at least this shows that in general some data can be transferred over 5 GHz IBSS with 2018.1.

(Edit: this comment is mainly so that I remember what I've tried already :-) . I will do some more analysis in the future).

neocturne commented 5 years ago

Might be a power save issue. You can try disabling it using iw dev <dev> set power_save off.

neocturne commented 5 years ago

Please also try the Gluon master, which is based on OpenWrt 18.06.

oliver commented 5 years ago

Powersave appears to be off already (iw dev ibss0 get power_save prints Power save: off). Running iw dev ibss0 set power_save off doesn't appear to make any difference regarding broadcast ping. Running iw dev ibss0 set power_save on prints command failed: Not supported (-122).

oliver commented 5 years ago

I've just installed 2018.1.3+bremen1 on a CPE510 v1 which is a 5GHz-only device, and mesh works fine (see https://map.ffhb.de/#!/en/map/c4e984b0a84a and http://cpe510-og-1.nodes.ffhb.de/). The device successfully meshes with another node. The CPE510 uses the ath9k driver.

So this problem doesn't affect all devices. So far the ath10k driver with -ct firmware shows the problem, while the ath9k driver works.

oliver commented 5 years ago

@oszilloskop can you check which driver is used on the devices which don't work for you? What does lsmod | grep ath show?

oszilloskop commented 5 years ago

I don't have a master firmware.

Ubiquiti Loco M5 XW (5GHz-only)
===============================
Gluon 2018.1.3
ibss
htmode 'HT20'

5GHz Mesh does not work

~# iw dev ibss0 station dump 
-> tx bitrate: 6.0 MBit/s
-> shows no rx bitrate

~# iw dev ibss0 get power_save
Power save: off

~# lsmod | grep ath
ath                    18387  3 ath9k,ath9k_common,ath9k_hw
ath9k                 109160  0 
ath9k_common           22062  1 ath9k
ath9k_hw              359564  2 ath9k,ath9k_common
cfg80211              234680  4 ath9k,ath9k_common,ath,mac80211
compat                 11245  4 ath9k,ath9k_common,mac80211,cfg80211
mac80211              416898  1 ath9k

The statuspage shows a "Nachbarknoten ibss0" graph of all 5GHz neighbor mesh nodes.

---

Ubiquiti UniFi-AC-MESH (dual 2.4/5GHz)
======================================
Gluon 2017.1.7
ibss
htmode 'HT20'

5GHz Mesh does not work
2.4GHz Mesh works fine

~# iw dev ibss0 station dump
-> tx bitrate: 6.0 MBit/s
-> shows no rx bitrate

~# iw dev ibss0 get power_save
Power save: off

~# lsmod | grep ath
ath                    18387  4 ath9k,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           310523  1 ath10k_pci
ath10k_pci             34719  0 
ath9k                 109160  0 
ath9k_common           22062  1 ath9k
ath9k_hw              359564  2 ath9k,ath9k_common
cfg80211              234552  5 ath9k,ath9k_common,ath10k_core,ath,mac80211
compat                 11245  4 ath9k,ath9k_common,mac80211,cfg80211
mac80211              416898  2 ath9k,ath10k_core

The statuspage shows a "Nachbarknoten ibss0" graph of all 5GHz neighbor mesh nodes.

---

Ubiquiti UniFi-AC-MESH (dual 2.4/5GHz)
======================================
Gluon 2018.1.3
ibss
htmode 'HT20'

5GHz Mesh does not work
2.4GHz Mesh works fine

~# iw dev ibss0 station dump 
-> tx bitrate: 6.0 MBit/s
-> shows no rx bitrate

~# iw dev ibss0 get power_save
Power save: off

~# lsmod | grep ath
ath                    18387  4 ath9k,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           310299  1 ath10k_pci
ath10k_pci             34687  0 
ath9k                 109160  0 
ath9k_common           22062  1 ath9k
ath9k_hw              359564  2 ath9k,ath9k_common
cfg80211              234680  5 ath9k,ath9k_common,ath10k_core,ath,mac80211
compat                 11245  4 ath9k,ath9k_common,mac80211,cfg80211
mac80211              416898  2 ath9k,ath10k_core

The statuspage shows a "Nachbarknoten ibss0" graph of all 5GHz neighbor mesh nodes.
oliver commented 5 years ago

@blocktrron he pointed out on IRC, that it also doesn't work with v2017.1.7 - this does not contain the mentioned patch. interestingly though, v2017.1.8 does work for Freifunk Bremen if i understood correctly what @oliver wrote. the low-hanging fruit, err, conclusion: one of both tests went wrong or it's a different issue. i think(!) v2017.1.x does not have a general issue with 5 GHZ ibss mesh as this would have hopefully been reported last year - and it works for FF Bremen as far as i understand.

@rotanid: good thing you mentioned this! I just did a test of this specific functionality, and it looks like 5 GHz meshing on AC7 was already broken with 2017.1.8+bremen1 :-( So no, v2017.1.8 does not really work for FFHB, and v2017.1.x does have a general issue with 5 GHZ ibss mesh.

Or maybe my test is wrong. I have set up three devices with 2017.1.8+bremen1, all located in the same room:

Result: the WDR3600 meshes with both devices. The AC7 and the CPE510 only mesh with the WDR3600. To me this indicates that the 5GHz mesh on AC7 doesn't work.

I guess we never systematically tested this, and the problem was never really noticed probably because the devices are still meshing via 2.4 GHz.

Anyway, I'll report back when I've found out more.

rotanid commented 5 years ago

as discussed in person, please try v2016.2.x based firmware (with a device like Archer C7 v2) and also try current master (or, the same, v2018.2.x as soon as it is released)

oszilloskop commented 5 years ago

I tested it on two Ubiquiti UniFi-AC-MESH with Gluon 2018.2 today. Unfortunately the result are the same 5GHz ibss mesh behavior like as the previous Gluon versions.

rotanid commented 5 years ago

thanks. now the last missing bit of information is if an ath10k devices works with v2016.2.x in IBSS 5 GHz mesh

oliver commented 5 years ago

Just installed FFHB firmware 2016.2.7+bremen1 on an AC7v2 (https://map.ffhb.de/#!/en/map/f4f26d70b277), and IBSS with 5 GHz is still broken. Same symptoms: no mesh connection to a CPE510, the status page only shows MAC address rather than name of the CPE510, and iw dev ibss0 station dump shows 6.0 MBit/s as tx/rx bitrate for all stations.

So it looks like this is not a regression, but rather it was broken since the beginning?

rotanid commented 5 years ago

i can't add anything to your conclusion ... and we likely can't fix anything if this never worked before. let's wait for @NeoRaider (who said to me at 35c3 that it should have worked before) - but maybe this issue will go into future release notes as "cant fix" ...

mortzu commented 5 years ago

I have the same issue on mANTBox 15s (ath10k). The mesh works in the moment I disabled the client wireless device.

oszilloskop commented 5 years ago

@mortzu Nice founding. I can confirm that this workaround works fine with my two UniFi-AC-MESH (dual 2.4/5GHz, ath9k/ath10k, IBSS).

So this problem doesn't affect all devices. So far the ath10k driver with -ct firmware shows the problem, while the ath9k driver works.

Furthermore I can confirm that my ath9k only device (Ubiquiti Loco M5 XW) does not have a 5GHz IBSS mesh problem. Now with correct working mesh partners, it meshed very well without that workaround.

EDIT: My tests were done with Gluon v2018.2.

mweinelt commented 5 years ago

This essentially means that ath10k with the candelatech driver/firmware has become unusable for Gluon, as it does not support ap+ibss at the same time.

rotanid commented 5 years ago

@mweinelt "become" ? according to the tests done by @oliver the situation didnt change in any way - maybe that's nitpicking, but i think that's a difference.

mweinelt commented 5 years ago

Yeah, nitpicking, because it renders the same result.

blocktrron commented 5 years ago

@oszilloskop This is expected behavior. ath9k is by far more open as ath10k. If the ath10k firmware does not support the service-set combination we need, we are - simply put - out of luck.

Maybe you can try an mt76 based device, apparently this driver sports IBSS support for 11ac (as it's "comparably libre" to ath9k), but it is flagged as broken for IBSS as it is untested.

oszilloskop commented 5 years ago

I only affirmation the test results which were already reported by @oliver and @mortzu. In my case, the result "not at the same time" is very important. ath10k can do 5GHz ibss mesh, but not simultaneously with client network. This is different for me than "it does not work".

rotanid commented 5 years ago

i adjusted the title of this issue accordingly.

CodeFetch commented 5 years ago

IBSS support will be dropped as described in #1747. This issue will likely not be resolved in Gluon. The issue I've opened in ath10-ct still remains open. Thus it might get fixed at some point, but there won't be a Gluon-release with this fix.

Thus I think this can be closed now.

rotanid commented 5 years ago

@CodeFetch for now it's a known issue in the current and next Gluon release.

CodeFetch commented 5 years ago

Okay. I've thought that it won't get fixed is a reason for closing it as "won't fix".

mweinelt commented 5 years ago

"Won't fix" is a reasonable assumption, so let me tag it like this.

mweinelt commented 5 years ago

With the deprecation of IBSS meshing in the v2019.2 release cycle this issue is out of scope for Gluon.

Please migrate to 802.11s in a timely fashion.