geerlingguy / internet-monitoring

Monitor your network and internet speed with Docker & Prometheus
1.28k stars 139 forks source link

Grafana container keeps restarting after hundreds of 'Client.Timeout exceeded while awaiting headers' errors #12

Closed geerlingguy closed 3 years ago

geerlingguy commented 3 years ago

Here's what I see in the logs this evening:

Error: ✗ Failed to send request: Get "https://grafana.com/api/plugins/repo/flant-statusmap-panel": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

(This repeats over 3000 times.)

A quick DDG search doesn't see anything obvious, and I can hit https://grafana.com/api/plugins/repo/flant-statusmap-panel from my browser okay. Not sure what's up, but it's causing the Grafana container to keep restarting and never fully launch the web UI.

geerlingguy commented 3 years ago

I just tried commenting the line:

#GF_INSTALL_PLUGINS=flant-statusmap-panel

Inside the config.monitoring file, then doing a docker-compose stop grafana and then starting it again...

But I'm still seeing the same error repeating :/

geerlingguy commented 3 years ago

Following my own guide here (https://github.com/geerlingguy/internet-pi/issues/7), I updated all the containers inside the internet-monitoring directory:

cd ~/internet-monitoring
docker-compose pull  # pulls the latest images inside the compose file
docker-compose up -d --no-deps  # restarts necessary containers with newer images
docker system prune --all  # deletes unused container images

And now Grafana will launch... but it's not showing any data in the dashboard for some reason.

Logs are showing:

t=2021-05-19T03:20:21+0000 lvl=eror msg="Unable to load datasource meta data" logger=context userId=1 orgId=1 uname=admin error="data source not found"
t=2021-05-19T03:20:21+0000 lvl=eror msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/12/api/v1/query_range status=500 remote_addr=10.0.100.213 time_ms=2 size=49 referer="http://geerli.net:3030/d/o9mIe_Aik/internet-connection?orgId=1&refresh=10s&from=now-5m&to=now"
geerlingguy commented 3 years ago

Well now the Ping container seems to be returning a down status for all pings (getting probe_success 0 after probe_dns_lookup_time_seconds 20.006422323, so a 20s timeout). So something's funky with it! Maybe the Docker daemon has lost its own ability to see DNS or something...

geerlingguy commented 3 years ago

Trying with http://pi-address:9115/probe?target=google.com&module=http_2xx&debug=true results in:

Logs for the probe:
ts=2021-05-19T03:32:56.80599451Z caller=main.go:320 module=http_2xx target=google.com level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2021-05-19T03:32:56.806308733Z caller=http.go:335 module=http_2xx target=google.com level=info msg="Resolving target address" ip_protocol=ip4
ts=2021-05-19T03:33:06.809560493Z caller=http.go:335 module=http_2xx target=google.com level=info msg="Resolving target address" ip_protocol=ip6
ts=2021-05-19T03:33:16.813660286Z caller=http.go:335 module=http_2xx target=google.com level=error msg="Resolution with IP protocol failed" err="lookup google.com on 127.0.0.11:53: read udp 127.0.0.1:41190->127.0.0.11:53: i/o timeout"
ts=2021-05-19T03:33:16.813852219Z caller=main.go:130 module=http_2xx target=google.com level=error msg="Error resolving address" err="unable to find ip; exhausted fallback: lookup google.com on 127.0.0.11:53: read udp 127.0.0.1:41190->127.0.0.11:53: i/o timeout"
ts=2021-05-19T03:33:16.8139349Z caller=main.go:320 module=http_2xx target=google.com level=error msg="Probe failed" duration_seconds=20.007851155
geerlingguy commented 3 years ago

My solution was to force the Docker daemon to use the local DNS server (pi-hole)...

# Create / edit the Docker daemon config and make sure DNS is configured.
$ sudo nano /etc/docker/daemon.json

# Make sure at least the following configuration is in that file:
{
    "dns": ["10.0.100.52"]
}

# Restart Docker.
$ sudo systemctl restart docker

...and when I check the ping container again I'm not seeing the updated DNS server:

$ docker exec -it 436 /bin/sh
/ # cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
geerlingguy commented 3 years ago

Going to reboot and pause debugging this if it doesn't magically clear up after restart (as long as the rest of the services are okay...).

geerlingguy commented 3 years ago

I had to keep flant-statusmap-panel commented to get Grafana running—I'm guessing that container's not resolving DNS either for some reason.

Ping container's still not getting DNS, so I've removed the daemon.json config for now and I'll take another look with fresh eyes.

Maybe the upgrade to the pi-hole container (yesterday, I think?) caused this somehow :/

geerlingguy commented 3 years ago

Trying the following works (suggested here:

pi@geerli:~ $ docker run -it --dns=8.8.8.8 alpine ping repo.hex.pm
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
595b0fe564bb: Already exists 
Digest: sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f
Status: Downloaded newer image for alpine:latest
PING repo.hex.pm (151.101.2.2): 56 data bytes
64 bytes from 151.101.2.2: seq=0 ttl=55 time=9.582 ms
64 bytes from 151.101.2.2: seq=1 ttl=55 time=9.502 ms
64 bytes from 151.101.2.2: seq=2 ttl=55 time=9.884 ms

Without I get nothing:

pi@geerli:~ $ docker run -it alpine ping repo.hex.pm
ping: bad address 'repo.hex.pm'
geerlingguy commented 3 years ago

Heh... someone else is running into something oddly familiar here: https://unix.stackexchange.com/q/647996/16194

geerlingguy commented 3 years ago

Found this from an old post on the Pi-hole subreddit: https://www.reddit.com/r/pihole/comments/jnvw14/resolving_dns_within_containers_not_working/

Changing resolv.conf to 127.0.0.1 actually fixed it!!!! THANK YOU!

Worth a try... note that /etc/resolv.conf is managed by resolvconf, so it's rewritten on reboot. So I edited the /etc/resolvconf.conf file and uncommented the line name_servers=127.0.0.1, and rebooted.

Hey, look at that! It's working now!

So... it turns out if you're running other Docker containers on the same Pi where you're running pi-hole (or another DNS server), you should edit resolvconf.conf and uncomment that line. Nice that documentation there is actually helpful :)

Going to open an issue on internet-pi to get this issue — wait for it — resolved.

geerlingguy commented 3 years ago

Opened follow-up issue: https://github.com/geerlingguy/internet-pi/issues/8