fzaninotto / uptime

A remote monitoring application using Node.js, MongoDB, and Twitter Bootstrap.
http://fzaninotto.github.com/uptime/
MIT License
3.62k stars 706 forks source link

Uptime goes down with a lot of unclosed connection #308

Open soullivaneuh opened 9 years ago

soullivaneuh commented 9 years ago

Uptime regulary goes down for no reason.

After some searches, we find this:

$ netstat -laputen | grep TIME_WAIT | awk '{ print $5 }' | sort | uniq -c | sort -g
1 104.28.15.109:80
1 127.0.0.1:40479
1 127.0.0.1:40481
1 178.170.104.103:80
1 178.170.104.112:80
1 178.170.104.169:80
1 178.170.104.6:80
1 178.33.43.161:80
1 185.46.230.244:80
2 108.162.202.7:80
2 141.101.118.125:80
2 162.159.241.55:80
2 178.170.104.229:80
3 104.28.20.41:80
3 108.162.201.7:80
3 127.0.0.1:8082
3 162.159.242.55:80
3 176.34.232.219:80
3 178.170.104.70:443
3 89.248.210.69:80
4 141.101.118.124:80
4 178.170.76.4:80
4 82.165.240.124:80
5 104.28.21.41:80
5 178.170.104.113:80
5 217.70.180.153:80
5 37.187.136.103:80
5 37.187.71.100:80
5 46.137.102.208:80
5 94.125.167.177:80
6 194.116.202.20:80
6 213.186.33.5:80
6 213.186.33.82:80
7 178.170.104.101:443
7 178.170.104.229:443
7 178.170.104.76:80
7 178.170.108.24:80
7 178.33.32.5:80
7 185.18.81.239:80
7 188.165.129.2:80
7 195.154.232.153:80
7 195.154.242.229:80
7 208.97.177.124:80
7 217.70.180.152:80
7 82.165.82.128:80
7 83.145.84.217:80
7 90.80.157.162:80
7 90.85.1.242:80
7 94.125.165.188:80
8 185.18.80.128:80
8 185.18.83.161:80
8 188.165.210.52:80
8 195.238.251.111:80
8 46.105.132.204:80
8 94.125.165.130:80
9 185.46.230.14:80
10 104.28.14.109:80
10 31.192.124.0:80
10 92.222.29.145:80
12 142.4.201.40:80
12 178.32.64.90:80
12 185.18.80.32:80
12 185.18.81.105:80
12 185.18.81.24:80
12 213.186.33.3:80
12 37.187.186.249:80
12 87.98.247.17:80
12 94.125.167.244:80
14 178.170.104.200:80
16 178.170.104.166:80
16 178.170.104.170:80
17 213.186.33.18:80
18 173.194.65.121:80
18 178.170.104.13:80
18 94.23.65.236:80
19 178.170.76.3:80
20 62.210.139.28:80
23 178.170.104.247:80
24 213.186.33.40:80
36 213.186.33.17:80
44 213.186.33.105:80

$ netstat -laputen | grep TIME_WAIT | awk '{ print $5 }' | sort | wc -l
668

I think Uptime crash on bad urls/websites and not close connection properly. Then, net connections queue is growing until Uptime crash.

What do you think about this?

For information, we have nearly 300 active check on our Uptime.

ghub2015 commented 9 years ago

I am not sure it is the same issue, but I am regularly seeing where Uptime will stop polling a random check for 30 - 45 minutes (not the same one every time) and then mysteriously start polling the check again as if nothing ever happened. (Uptime itself is running during the entire time; I have the dashboard monitored by Monit)

I am using it with 10+ remote monitors, feeding into a VM running the dashboard, which then connects to a separate VM running the MongoDB. (Because of memory limitations with MongoDB, MongoDB needs to be on a 64bit instance, but Uptime dashboard/monitors need to run on 32bit instances).