geerlingguy / internet-monitoring

Monitor your network and internet speed with Docker & Prometheus
1.28k stars 139 forks source link

Speedtest container goes unhealthy after a few hours (consistently) #1

Closed geerlingguy closed 3 years ago

geerlingguy commented 3 years ago

The speedtest container seems to just go unhealthy from time to time, with nothing in the logs indicating an issue. It's just like the flask app locks up and stops returning HTTP responses altogether.

This leads to one of my favorite graphs having blank periods until I log into the Pi and manually restart the container (composer restart speedtest):

Screen Shot 2021-04-09 at 9 47 56 AM

I asked about this problem upstream here: https://github.com/MiguelNdeCarvalho/speedtest-exporter/issues/48#issuecomment-816360616

But I'm thinking of adding a simple cron job / script that checks if the container is unhealthy (maybe every 5 or 10 minutes), and if so, restarts it.

geerlingguy commented 3 years ago

Process for recovery:

  1. Run docker ps and see if internet-monitoring_speedtest_1 container is (unhealthy).
  2. If so, cd internet-monitoring/ and docker-compose restart speedtest.
geerlingguy commented 3 years ago

Same thing happened this morning, and if I log into the container and manually run the healthcheck, it times out:

pi@raspberrypi:~ $ docker exec -it 4088ef2f3c04 sh
/app $ wget localhost:9798
Connecting to localhost:9798 (127.0.0.1:9798)
wget: error getting response: Address not available

Restarting fixes it again.

geerlingguy commented 3 years ago

I just updated the container to the latest (3.1) version as mentioned in the issue linked just above this comment. I'll see if that newer version solves this dropout issue.

nickodell commented 3 years ago

You may be running out of temporary ports. https://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html

If you are running many TCP client programs on your machine and these programs do not specify ports when they make connections to the server, the system would assign temporary ports for them. When a connection is closed, the port will be returned to the system for reuse. The temporary ports have a limited range. When the allowed connections are exhausted, the program could get an error when it tries to make any new connections. The error number it generates would be EADDRNOTAVAIL (Address not available).

geerlingguy commented 3 years ago

Possibly... I'll try to check for that if this happens again. So far 10 hours with no dropouts yet... fingers crossed!

MiguelNdeCarvalho commented 3 years ago

Do you have any news?

geerlingguy commented 3 years ago

So far the new version has been working consistently for the past three days :)

geerlingguy commented 3 years ago

Still will leave it open for a few more days until I get some time to work more on my dashboard.

geerlingguy commented 3 years ago

Haven't had this problem again. Yay!