bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 494 forks source link

bosun may cause errors on remote OpenTSDB host #240

Closed maddyblue closed 10 years ago

maddyblue commented 10 years ago

After some time of relaying data, bosun produces these errors:

2014/09/09 18:46:29 error: queue.go:75: Post ny-tsdb03.ds.stackexchange.com: dial tcp 10.7.0.233:4242: cannot assign requested address

When this happens, data is not getting into OpenTSDB (but bosun is correctly processing the search data).

This is not a bosun error, it is a HTTP error returned by OpenTSDB with the text "dial tcp x.x.x.x: cannot assign requested address" which is reported by bosun. Restarting bosun will resolve these errors, however. This suggests that bosun is doing something bad and causing OpenTSDB to produce this error.

A theory: bosun is not closing TCP connections correctly, or doing something bad with keep-alives. OpenTSDB maintains these open sockets to bosun. When OpenTSDB needs to process a connection, some syscall is performed to listen on the socket, and this fails because of a resource exhaustion (FD limit?)

maddyblue commented 10 years ago

I assumed the above error message was from bosun, but it's actually from scollector. This implies it's a bosun bug and has nothing to do with OpenTSDB.

maddyblue commented 10 years ago

This appears to have been caused due to a very high number of connections on ny-bosun01 due to keep alives being disabled. Enabling them seems to have fixed it.