Open AiyionPrime opened 3 years ago
A router that has a WG-connection and several wifi mesh partners seemed to have lost the connection to WG, although in the status page of the router, it shows still connected to the WG supernode. However, that router did not or could not use that WG-connection but instead routed via wifi mesh.
What I tried is, disable wifi for 5 minutes via "wifi down ; sleep 300 ; wifi" in order to force the router to user the WG-connection instead of the wifi mesh way. Didn't work. Router was offline for 5 minutes.
What helped, was a restart of WG with "ifdown vpn ; sleep 5 ; ifup vpn"
Hi Bernd,
thanks for the description. I would like to collect some more information:
On Thu, 25 Feb, 2021, 20:41 Bernd Schittenhelm, notifications@github.com wrote:
A router that has a WG-connection and several wifi mesh partners seemed to have lost the connection to WG although in the status page it shows still connected. However, that router did not or could not use that WG-connection but routed via wifi mesh.
What I tried is, disable wifi for 5 minutes via "wifi down ; sleep 300 ; wifi" in order to force the router to user the WG-connection instead of the wifi mesh way. Didn't work. Router was offline for 5 minutes.
What helped, was a restart of WG with "ifdown vpn ; sleep 5 ; ifup vpn"
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/freifunkh/ansible/issues/175#issuecomment-786153136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESYQMXBKEIKRGUX5TS6YDTA2RV5ANCNFSM4YHCMHFQ .
I would have to wait for another occasion. It happened twice already. I can't tell when it happened because the router, in that case, is still online via mesh. You see it only when you click on the router. After restarting WG, it connected to a different SN.
I added a graph in the router dashboard in Grafana at the very bottom, which shows the vpn neighbors.
https://stats.ffh.zone/d/000000021/router-fur-meshviewer?orgId=1
@bschelm: Can you have a look, whether the outages are visible there?
On Fri, 26 Feb, 2021, 10:12 Bernd Schittenhelm, notifications@github.com wrote:
I would have to wait for another occasion. It happened twice already. I can't tell when it happened because the router, in that case, is still online via mesh. You see it only when you click on the router. After restarting WG, it connected to a different SN.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/freifunkh/ansible/issues/175#issuecomment-786515106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESYQN5X3ET33WRMVWTSETTA5QYFANCNFSM4YHCMHFQ .
Nope. VPN-Neighbours is always zero. Same on my router.
@bschelm I added another graph to the dashboard. It's quite messy, so I selected some traces and posted a screenshot above. The selected traces contain rx TQ from and tx TQ to the supernodes. Are your outages correlated to the gaps in the graph?
Well, the time range is kinda long. Here is a more detailed screenshot of the recent history:
From all what I have heard, this doesn't happen very often. So let's start with our Infrastructure Freeze Week, and see whether it will occur again in that week. If it happens again, please do not "fix" it directly, but collect as many data as possible:
batctl n
from the routerbatctl meshif bat14 n
from the connected supernodewg show
from the routerwg show
from the connected supernodeip -6 route
from the routerip -6 route
from the supernodetcpdump -n -i vpn inbound -w /tmp/test1.pcap
from the router (collect it via scp)tcpdump -n -i vpn outbound -w /tmp/test2.pcap
from the router (collect it via scp)tcpdump -n -i vx_vpn_wired inbound -w /tmp/test3.pcap
from the router (collect it via scp)tcpdump -n -i vx_vpn_wired outbound -w /tmp/test4.pcap
from the router (collect it via scp)tcpdump -n -i br-wan inbound -w /tmp/test5.pcap
from the router (collect it via scp)tcpdump -n -i br-wan outbound -w /tmp/test6.pcap
from the router (collect it via scp)tcpdump -n -i vx-14 inbound -w /root/test7.pcap
from the supernode (collect it via scp)tcpdump -n -i vx-14 outbound -w /root/test8.pcap
from the supernode (collect it via scp)tcpdump -n -i wg-14 inbound -w /root/test9.pcap
from the supernode (collect it via scp)tcpdump -n -i wg-14 outbound -w /root/test10.pcap
from the supernode (collect it via scp)bridge fdb show | grep vx
from the connected supernodelogread
from the routeruci export
from the routerip addr show
from the routerHopefully this data will be enough to find the issue.
I think, this is the same issue as #147 .
It does not make sense to have either #175 (this issue) or #147 as blocker for the infrastructure freeze week, so I'll remove the milestone here.
I think, this is the same issue as #147 .
I don't remember exactly why, but we came to the conclusion it wasn't; maybe @1977er remembers this better, but I think it was due to some fixes applied on sn09, which did not correlate to resolving this issue.
Is this still an issue?
We still have both WireGuard and fastd nodes and have not yet resolved the issue.
Is there any setup, where we saw this recently?
CC: @bschelm?
Jan-Niklas Burfeind @.***> schrieb am Mo., 17. Apr. 2023, 00:00:
We still have both WireGuard and fastd nodes and have not yet resolved the issue.
— Reply to this email directly, view it on GitHub https://github.com/freifunkh/ansible/issues/175#issuecomment-1510499886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESYQNM2VUIPELNLEAWY6TXBRTX5ANCNFSM4YHCMHFQ . You are receiving this because you commented.Message ID: @.***>
@CodeFetch and @bschelm observed, routers tend to like connection via fastd, rather then wireguard.
@CodeFetch further found this to be connected to packetloss in wireguard.
We need statistics to back these theses up.