freifunkh / ansible

Here we store all Ansible roles and configs used for Freifunk Hannover.
MIT License
7 stars 3 forks source link

routers apparently prefer fastd #175

Open AiyionPrime opened 3 years ago

AiyionPrime commented 3 years ago

@CodeFetch and @bschelm observed, routers tend to like connection via fastd, rather then wireguard.

@CodeFetch further found this to be connected to packetloss in wireguard.

We need statistics to back these theses up.

bschelm commented 3 years ago

A router that has a WG-connection and several wifi mesh partners seemed to have lost the connection to WG, although in the status page of the router, it shows still connected to the WG supernode. However, that router did not or could not use that WG-connection but instead routed via wifi mesh.

What I tried is, disable wifi for 5 minutes via "wifi down ; sleep 300 ; wifi" in order to force the router to user the WG-connection instead of the wifi mesh way. Didn't work. Router was offline for 5 minutes.

What helped, was a restart of WG with "ifdown vpn ; sleep 5 ; ifup vpn"

lemoer commented 3 years ago

Hi Bernd,

thanks for the description. I would like to collect some more information:

On Thu, 25 Feb, 2021, 20:41 Bernd Schittenhelm, notifications@github.com wrote:

A router that has a WG-connection and several wifi mesh partners seemed to have lost the connection to WG although in the status page it shows still connected. However, that router did not or could not use that WG-connection but routed via wifi mesh.

What I tried is, disable wifi for 5 minutes via "wifi down ; sleep 300 ; wifi" in order to force the router to user the WG-connection instead of the wifi mesh way. Didn't work. Router was offline for 5 minutes.

What helped, was a restart of WG with "ifdown vpn ; sleep 5 ; ifup vpn"

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/freifunkh/ansible/issues/175#issuecomment-786153136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESYQMXBKEIKRGUX5TS6YDTA2RV5ANCNFSM4YHCMHFQ .

bschelm commented 3 years ago

I would have to wait for another occasion. It happened twice already. I can't tell when it happened because the router, in that case, is still online via mesh. You see it only when you click on the router. After restarting WG, it connected to a different SN.

lemoer commented 3 years ago

I added a graph in the router dashboard in Grafana at the very bottom, which shows the vpn neighbors.

https://stats.ffh.zone/d/000000021/router-fur-meshviewer?orgId=1

@bschelm: Can you have a look, whether the outages are visible there?

On Fri, 26 Feb, 2021, 10:12 Bernd Schittenhelm, notifications@github.com wrote:

I would have to wait for another occasion. It happened twice already. I can't tell when it happened because the router, in that case, is still online via mesh. You see it only when you click on the router. After restarting WG, it connected to a different SN.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/freifunkh/ansible/issues/175#issuecomment-786515106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESYQN5X3ET33WRMVWTSETTA5QYFANCNFSM4YHCMHFQ .

bschelm commented 3 years ago

Nope. VPN-Neighbours is always zero. Same on my router.

lemoer commented 3 years ago

Screenshot from 2021-02-27 14-14-54

@bschelm I added another graph to the dashboard. It's quite messy, so I selected some traces and posted a screenshot above. The selected traces contain rx TQ from and tx TQ to the supernodes. Are your outages correlated to the gaps in the graph?

lemoer commented 3 years ago

Well, the time range is kinda long. Here is a more detailed screenshot of the recent history:

Screenshot from 2021-02-27 14-24-31

lemoer commented 3 years ago

From all what I have heard, this doesn't happen very often. So let's start with our Infrastructure Freeze Week, and see whether it will occur again in that week. If it happens again, please do not "fix" it directly, but collect as many data as possible:

Hopefully this data will be enough to find the issue.

lemoer commented 3 years ago

I think, this is the same issue as #147 .

lemoer commented 3 years ago

It does not make sense to have either #175 (this issue) or #147 as blocker for the infrastructure freeze week, so I'll remove the milestone here.

AiyionPrime commented 3 years ago

I think, this is the same issue as #147 .

I don't remember exactly why, but we came to the conclusion it wasn't; maybe @1977er remembers this better, but I think it was due to some fixes applied on sn09, which did not correlate to resolving this issue.

lemoer commented 1 year ago

Is this still an issue?

AiyionPrime commented 1 year ago

We still have both WireGuard and fastd nodes and have not yet resolved the issue.

lemoer commented 1 year ago

Is there any setup, where we saw this recently?

CC: @bschelm?

Jan-Niklas Burfeind @.***> schrieb am Mo., 17. Apr. 2023, 00:00:

We still have both WireGuard and fastd nodes and have not yet resolved the issue.

— Reply to this email directly, view it on GitHub https://github.com/freifunkh/ansible/issues/175#issuecomment-1510499886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESYQNM2VUIPELNLEAWY6TXBRTX5ANCNFSM4YHCMHFQ . You are receiving this because you commented.Message ID: @.***>