Retire gateway_announcement.sh

1977er commented 3 years ago

The current incarnation of that script checks the availability of an uplink with a few pings (anycast). If that fails the batman gatewaymode will be degraded and the dhcpd will be stopped.

Unfortunately these pings sometimes (a few times per day) get lost due to heavy load on the supernodes CPU. In consequence the monitoring kicks in, detects the discontinuity and sends out lots of emails. One minute later the reverse process sends out another load of emails.

Proposed actions:

[ ] retire gateway_announcement.sh entirely
[ ] create a zabbix item that does the anycast pings
[ ] create a zabbix trigger that is less sensitive than the current regime, but still reports longer outages (without automatic counter-measures)

AiyionPrime commented 3 years ago

I that case we would degrade the automatic measures to manual ones. I don't think our response time is good enough to keep up with even a flapping automatic service.

AiyionPrime commented 3 years ago

And having zabbix initiate the lacking countermeasures would create a rather big dependency on it.

1977er commented 3 years ago

Shutting down dhcpd has never been a necessary step. By degrading the gatewaymode a potential dhcpd is not longer able to communicate with its clients.

What about leaving the gatewaymode degration untouched but removing the dhcpd-start-stop mechanism?

AiyionPrime commented 3 years ago

I'd need to learn about it in order to come to a sensible conclusion. Likely @CodeFetch and @lemoer can ack the idea easier.

To me it appears to be irritating, that the solution to this is lets just stop, what we've been doing, discussed about and and never found the time nor a suitable solution for years now;

1977er commented 1 year ago

Silently closing this issue as it is not accepted by everybody.

freifunkh / ansible

Retire gateway_announcement.sh #192