MeetMe / newrelic-plugin-agent

Multi-Plugin python-based Agent for NewRelic's Platform
BSD 3-Clause "New" or "Revised" License
425 stars 265 forks source link

monitor crashes when redis server becomes unavailable #201

Open markslemko opened 10 years ago

markslemko commented 10 years ago

We have several Redises being monitored with the agent.

We noticed an issues that when we bring one Redis down while being monitored the agent crashes.

This isn't great for 2 reasons: 1) newrelic is unable to monitor the other apps while it is down. 2) newrelic is unable to notify that there is a problem with the Redis that is down

It would be nice to be able to know that the Redis is unreachable as a separate monitoring field that can trigger alerts.

gmr commented 10 years ago

Can you provide a traceback of the crash?

markslemko commented 10 years ago

I'm not sure which one of these 2 are about this problem, but I think it is the first one. I hope that helps. I used /var/log/newrelic_plugin_agent.errors

Is there another log file that would be helpful?

------------------------------------------------------------------------ [START] /usr/local/bin/newrelic_plugin_agent Exception [2013-12-05T21:16:38.875070] ------------------------------------------------------------------------- [INFO] Interpreter: /usr/bin/python CLI arguments: /usr/local/bin/newrelic_plugin_agent -c /etc/newrelic/newrelic_plugin_agent.cfg Exception: [Errno 104] Connection reset by peer Traceback: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/helper/unix.py", line 88, in start self.controller.start() File "/usr/local/lib/python2.7/dist-packages/helper/controller.py", line 266, in start self.run() File "/usr/local/lib/python2.7/dist-packages/helper/controller.py", line 257, in run signal.pause() File "/usr/local/lib/python2.7/dist-packages/helper/controller.py", line 428, in _wake self.process() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 118, in process self.start_plugin_polling() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 277, in start_plugin_polling self.config.application.get(plugin)) File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 108, in poll_plugin thread.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.target(_self.__args, *_self.kwargs) File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 305, in thread_process obj.poll() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/plugins/base.py", line 285, in poll connection = self.connect() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/plugins/redis.py", line 112, in connect buffer_value = connection.recv(self.SOCKET_RECV_MAX) error: [Errno 104] Connection reset by peer -------------------------------------------------------------------------- [END] ------------------------------------------------------------------------ [START] /usr/local/bin/newrelic_plugin_agent Exception [2013-12-06T16:17:50.567290] ------------------------------------------------------------------------- [INFO] Interpreter: /usr/bin/python CLI arguments: /usr/local/bin/newrelic_plugin_agent -c /etc/newrelic/newrelic_plugin_agent.cfg Exception: 'NoneType' object has no attribute 'send' Traceback: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/helper/unix.py", line 88, in start self.controller.start() File "/usr/local/lib/python2.7/dist-packages/helper/controller.py", line 266, in start self.run() File "/usr/local/lib/python2.7/dist-packages/helper/controller.py", line 257, in run signal.pause() File "/usr/local/lib/python2.7/dist-packages/helper/controller.py", line 428, in _wake self.process() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 118, in process self.start_plugin_polling() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 277, in start_plugin_polling self.config.application.get(plugin)) File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 108, in poll_plugin thread.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.target(_self.__args, *_self.kwargs) File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/agent.py", line 305, in thread_process obj.poll() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/plugins/base.py", line 285, in poll connection = self.connect() File "/usr/local/lib/python2.7/dist-packages/newrelic_plugin_agent/plugins/redis.py", line 109, in connect connection.send("*2\r\n$4\r\nAUTH\r\n$%i\r\n%s\r\n" % AttributeError: 'NoneType' object has no attribute 'send' -------------------------------------------------------------------------- [END]

trbs commented 10 years ago

There is still an issue after my patch where now the server is reported as "green" only without any stats since we could not connect to Redis.

It should show up as RED in NewRelic ? since we expected to connect to Redis but that fails. So it's not GREEN cause it's not a good state we are in. It's also not GREY since we are reporting metrics.

fnordfish commented 9 years ago

Hi there, just wanted to give this a bump because we keep experiencing this with newrelic-plugin-agent v1.3.0

Any news?

Oh, and I'm very sorry if this is the wrong repo (was following the link in newrelic)