Open maurerle opened 1 week ago
Can you check if the tx retries / tx failed counters from iw dev mesh{0,1} station dump
are continously incrementing?
They are slightly increasing, but most of the time, they are constant.
root@ffac-seilpforte-wr3000:~# iw dev mesh0 station dump | grep "tx failed"
tx failed: 2228
tx failed: 47
tx failed: 2249
tx failed: 127
tx failed: 535
# after 10 minutes
root@ffac-seilpforte-wr3000:~# iw dev mesh0 station dump | grep "tx failed"
tx failed: 2236
tx failed: 85
tx failed: 2259
tx failed: 171
tx failed: 535
root@ffac-seilpforte-wr3000:~# iw dev mesh0 station dump | grep "tx retries"
tx retries: 2223
tx retries: 47
tx retries: 2234
tx retries: 124
tx retries: 506
# after 10 minutes
root@ffac-seilpforte-wr3000:~# iw dev mesh0 station dump | grep "tx retries"
tx retries: 2231
tx retries: 85
tx retries: 2243
tx retries: 168
tx retries: 506
batctl p towards some mesh partner often does not work either, with package losses above 90%. Does this help?
I just tested the MTK patch: https://github.com/freifunk-gluon/gluon/commit/dd114b5fc2dec3f2e7feef52a7238399b39f0a9e from @blocktrron's branch: https://github.com/freifunk-gluon/gluon/compare/main...blocktrron:gluon:mtk-git-txs.patch
It looked good until I reloaded the driver at about 7:20 Then we had the usual airtime and link stability problems. Until I reloaded the driver again at 08:45. Problems then started again at 9:30
The logread still does not hint to something useful. So this issue is waiting for other ideas for now :)
General instability on mediatek filogic devices with mt7915e have been seen, especially on the WR3000, WAX220 and others. It has to be noted that some devices work better than others. Heavy wifi mesh seems to make the situation worse.
What is the problem?
An example of this is this behavior is this device: https://grafana.ffac.rocks/d/000000002/node?orgId=1&var-node=80afca06d558&from=1718344052951&to=1718403869219&viewPanel=13![image](https://github.com/freifunk-gluon/gluon/assets/25026204/0b5695a4-e52b-4893-b5c2-33384e1fcec9)
which includes very varying TQ of the device.
The latest finding is this: https://grafana.ffac.rocks/d/000000002/node?orgId=1&var-node=80afca06d558&from=1720175532350&to=1720193698710&var-select_hostname=ffac-seilpforte-wr3000&var-hostname=ffac-seilpforte-wr3000&var-saveinterval=1m&var-nodetolink=0c0e76cf5d5e&viewPanel=13![image](https://github.com/freifunk-gluon/gluon/assets/25026204/5e1f3fcf-8d25-4b17-9e23-55b6273adbd8)
At 1. I restarted the wifi driver using
rmmod mt7915e && modprobe mt7915e
At 2. I added another mesh device with which this device could mesh on mesh1, creating the timeout issue without the device being possible to reload the firmware At 3. I restarted the device, as nothing helped.Afterward, the weird changing TQ can be seen, which behaves in weird waves.
The current workaround includes reloading the mt7915e driver and rebooting the device once the mt7915e bug from #3154 occurs. A package for this can be found here: https://github.com/ffac/gluon-packages/tree/main/ffac-mt7915-hotfix/files/lib/gluon/mt7915
As @nrbffs also noted on IRC, some other people reported instability with these devices as well. Currently, reloading the wifi driver twice a day seems to help in this situation..
This issue is not about #3154 but about the weird changing TQ leading to bad mesh quality and wifi quality.
What is the expected behaviour?
Mesh and wifi quality should be stable on mediatek filogic devices such as the WR3000.
Further steps
ls /sys/kernel/debug/ieee80211/phy*/mt76
to find somethingTX_Stats
I found that on other devices
cat /sys/kernel/debug/ieee80211/phy1/mt76/tx_stats
does only show values for 1 to 4 while the affected WR3000 has values for 1 to 8I do not really know if this is related or not, just a finding.
Gluon Version: v2023.2.3
Site Configuration: ffac @ v2023.2.3-2
Custom patches: see site