Open drewmiranda-gl opened 1 week ago
Working on properly reproducing this and now I'm not sure the issue has anything to do with DNS. It seems Ubuntu can continue to resolve the .local
hostnames regardless of the upstream DNS resolver running or being stopped. I'm not sure why stopping/starting would cause downstream blip unless this is a weird edgecase/bug with ubuntu's systemd-resolved.
If anyone has any ideas on how to enable more verbose logging in graylog to troubleshoot let me know. I attempted to enable several loggers but could not get any output to server.log
Graylog's use of LeaderPresenceCheckPeriodical is very aggressive and even a momentary blip where Graylog is not able to communicate to its Mongo cluster (for example if the Graylog server is unable to resolve the hostname configured in Graylog's mongo uri) Graylog will present a NO_LEADER system alert.
It looks like there is already code to allow for leniency with these checks but it only applies when using automatic leader election.
Additionally, Graylog does not log any further information about this issue making it very difficult (if not impossible) to both understand what is happening and how to resolve it or prevent it from occurring.
Expected Behavior
Current Behavior
NO_LEADER system alert is far to sensitive and triggers for momentary blips.
No further logging nor information is provided about why this happened.
Possible Solution
Allow grace period even when NOT using automatic leader election.
Provide logging about what is happening.
Steps to Reproduce (for bugs)
I can reproduce this as follows:
Using Ubuntu Server 22.04 LTS
.local
for the mongo uriContext
Uncovered this after enabling notifications for Graylog's "System notification events" event definitions and it would repeatedly trigger. I then correlated that this always occurred whenever the resolver server was restarted on my pfSense router.
Your Environment
Please let me know if there are any questions