[BUGS] : MAJOR - when canopsis filesystem is full, neb stop scheduling of nagios

rfronteau commented 12 years ago

Hi,

When Filesystem of canopsis mongodb is full, neb2amqp stop scheduling of nagios pollers.

Nagios deamon is alive but the scheduling stop.

Fix (https://github.com/capensis/canopsis-nagios/commit/5e5673a79d2cfec3c162b4c5f125a14e7d9cc1a1) doesn't correct this bug.

Thanks

Romuald

william-p commented 12 years ago

Can you try with branch issue_#1 ? Thx

rfronteau commented 12 years ago

Hi,

Start Nagios with strace :

sendto(4, "\1\0\1\0\0\0a\0<\0(\0\0\17canopsis.eventsIna"..., 105, MSG_NOSIGNAL, NULL, 0) = 105
sendto(4, "\2\0\1\0\0\0 \0<\0\0\0\0\0\0\0\0\24\0\220\0\20applicatio"..., 40, MSG_NOSIGNAL, NULL, 0) = 40
writev(4, [{"\3\0\1\0\0\24\0", 7}, {"{\"timestamp\": 1349859303, \"sourc"..., 5120}, {"\316", 1}], 3 <unfinished ...>

unfinished ? Perhaps connector stay connection and don't return control to nagios ... By default, nagios have 10 process forks, I think when Filesystem of mongodb is full, connector can't write in bus AMQP and don't timeout.

In result, Nagios can't add new process forks and stop the scheduling queue.

rfronteau commented 12 years ago

Hi,

Socket Timeout works, I try to run my monitoring with canopsis filesystem full during 2 days and it's work. The scheduling queue doesn't stop.

But data lost, add buffer feature it's necessary.

rfronteau commented 12 years ago

Hi,

Socket Timeout works, I try to run my monitoring with canopsis filesystem full during 2 days and it's work. The scheduling queue doesn't stop.

But data lost, add buffer feature it's necessary as soon as possible.

Thanks a lot

Romuald

capensis / canopsis-nagios

[BUGS] : MAJOR - when canopsis filesystem is full, neb stop scheduling of nagios #1