Closed dangowrt closed 5 years ago
Can you narrow the time period for how long link outage must occur to trigger this problem? I am eager to reproduce this problem
Can you narrow the time period for how long link outage must occur to trigger this problem?
most of the time i bumped into this when a ath9k node run into the rx deafness bug, i usually fix this by re-running wpa_supplicant and then bmx7 won't recover
I am eager to reproduce this problem
take 3x ubnt xw devices in sub-optimal link conditions. sooner or later (sometimes hours, sometimes days) the rx deafness will occur on at least one of them. if you send me your ssh pubkey via email i can also give you access to the testbed here (the gateway got a static public ipv4 address), that may be the easiest way for you to observe this behaviour in the wild.
@dangowrt find keys of a GitHub user here https://github.com/axn.keys
Till now I can not reproduce it. I created three nodes mesh with links A-B B-C A-C and then set A-B and A-C fully asymmetric so the A can not hear anything anymore from B nor C. You can see A loosing state from B and C. While B and C keep state from A. Then, whatever I do (e.g. up/down interface, description updates, waiting, restarting...) I never get the message reported above. And All nodes recover quickly.
One could likely fix the dead-lock by disabling this condition: https://github.com/bmx-routing/bmx7/blob/master/content.c#L978 But I'd like to understand whats going on...
So, whenever that happens again at 'tm-link', can you dump some last lines of logread (or bmx7 -c d0) which should both report the involved llip and chash. Check which neighbors have these involved llips And dump the state of cached content at 'tm-link' bmx7 -c show=status show=interfaces show=links show=originators show=contents /r=0
thx
I could reproduce the issue. Occured due to multiple advertisements with identical content from different nodes in combination with certain node-discovery sequences...bla bla. E.g.:
https://github.com/bmx-routing/bmx7/commit/56bb017774cb631150e7c08f037e914c68535107 should solve the problem
:+1:
we will see
root@rdntz-gateway:/tmp# bmx7 -c s=o | grep 56bb017
728F6F15 liebmann92-m5 pA A A A A 20012 104 733+772 21 56bb017 fd70:728f:6f15:e070:5b74:858d:169e:4b90 br-lan FA1018A3 rdntz-wurze2-9 14548K 4 21 4
96509424 rhizomia-wald pA A A A A 29313 315 705+756 21 56bb017 fd70:9650:9424:a1b7:c5ad:ac90:63bf:c991 br-lan FA1018A3 rdntz-wurze2-9 9175K 6 52 4
CB0E15A8 stannebeinplatz-m5 pA A A A A 21412 54 733+777 21 56bb017 fd70:cb0e:15a8:497e:d6eb:1612:1bd4:a6a br-lan FA1018A3 rdntz-wurze2-9 21233K 3 12 6
6FD6A0AC tm-ap pA A A A A 26412 686 733+780 21 56bb017 fd70:6fd6:a0ac:690:e837:82b2:9d9c:2bf1 br-lan FA1018A3 rdntz-wurze2-9 15728K 5 119 3
846C7435 tm-link pA A A A A 55112 212 733+782 21 56bb017 fd70:846c:7435:bbff:3743:5c8e:e3a1:5cdf br-lan FA1018A3 rdntz-wurze2-9 15728K 4 39 3
root@rdntz-gateway:/tmp# bmx7 -c s=o | grep 56bb017 | wc -l
5
Thanks for testing. Merged https://github.com/bmx-routing/bmx7/commit/56bb017774cb631150e7c08f037e914c68535107 into master
Once a node looses it's connectivity (usually due to ath9k-related wifi hickups) the network won't remerge once connectivity comes back. One then has to excessively purge the bmx7 state (
rm -rf /var/run/bmx7
) from all nodes in the mesh and restart bmx7...