Closed jaydio closed 1 year ago
Thanks for the feedback with details. I am not exactly sure why this could have happened. Did the issue stop on its own or did it require restarting the DNS servers? Any idea on available memory on the server during the issue?
The error log is unrelated to it since its just a reverse lookup which failed to resolve that usually takes place to show the domain name for Top Clients on the dashboard.
Thanks for the feedback with details. I am not exactly sure why this could have happened. Did the issue stop on its own or did it require restarting the DNS servers? Any idea on available memory on the server during the issue?
Yes, it required restarting each container in order to restore service.
Each box has ample of memory.
Also seeing =< 200M memory utilization of the DnsServer container on any box.
Will update this issue if it happens again.
The error log is unrelated to it since its just a reverse lookup which failed to resolve that usually takes place to show the domain name for Top Clients on the dashboard.
Yeah, that's what I thought, was just the only stack trace I could find in the log file, so I included it.
This kept on happening, but only when NSD was the master server. Had to switch platforms for this particular project and am unable to investigate this further. Thanks to @ShreyasZare for spending countless hours and also trying to reproduce this in the lab using my configs. Will close this ticket now.
Hi there,
I've got four authoritative DnsServer servers deployed using the official docker image.
This morning, at around 7:10am (UTC+8), all nodes simultaneously started consuming 100% of CPU cycles and stopped responding to DNS queries.
The following platforms are used (all on latest patch level):
Docker version (identical across all platforms):
Here are some graphs I've pulled from netdata:
PS output from the container itself:
I was able to pull a stack trace from the log file, but it happened before the CPU utilization started spiking. Just switched all logs to LOCAL and set the correct timezone for all containers as well.
Couple of additional notes: