DataDog / dd-agent

Datadog Agent Version 5
https://docs.datadoghq.com/
Other
1.3k stars 812 forks source link

dd-agent monitorin: instance #0 [ERROR]: error(32, 'Broken pipe') #1136

Closed ssbarnea closed 10 years ago

ssbarnea commented 10 years ago

I discovered that if I restart postgresql dd-agent is not able to restore the connection to the database, ending up with errors like:

instance #0 [ERROR]: error(32, 'Broken pipe')

Update: it seems that these errors do appear on normal execution, quite often even without restarting the database engine:

2014-09-29 12:22:28 BST | ERROR | dd.collector | checks.postgres(__init__.py:552) | Check 'postgres' instance #2 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 543, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/postgres.py", line 325, in check
    self._collect_stats(key, db, tags, relations)
  File "/opt/datadog-agent/agent/checks.d/postgres.py", line 176, in _collect_stats
    cursor.execute(query.replace(r'%', r'%%'))
  File "/root/.local/lib/python2.7/site-packages/pg8000/core.py", line 520, in execute
    self._stmt = PreparedStatement(self._c, operation, args)
  File "/root/.local/lib/python2.7/site-packages/pg8000/core.py", line 1984, in __init__
    self.c.parse(self, self.statement)
  File "/root/.local/lib/python2.7/site-packages/pg8000/core.py", line 1492, in parse
    self._send_message(PARSE, val)
  File "/root/.local/lib/python2.7/site-packages/pg8000/core.py", line 1561, in _send_message
    self._write(code)
  File "/opt/datadog-agent/embedded/lib/python2.7/socket.py", line 324, in write
    self.flush()
  File "/opt/datadog-agent/embedded/lib/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

You may assume that postgresql monitoring doesn't work at all due to this, but it seems that it works, while still showing these inside the logs.

instances:
   -   host: localhost
       port: 5432
       username: confluence
       dbname: confluence
       password: zzz
   -   host: localhost
       port: 5432
       username: jira
       dbname: jira
       password: zzz
   -   host: localhost
       port: 5432
       username: crowd
       dbname: crowd
       password: zzz
remh commented 10 years ago

Hi @ssbarnea . Thanks for your feeback! Closing that issue as it's the expected behavior. Sometimes the connection to the database will fail but the agent should try to reset the connection if that's the case so monitoring will still work.

Please contact our support team if you have more issues.

ssbarnea commented 10 years ago

I think that you didn't understand the problem: The log is full of these and I am sure that the postgresql server is not dropping connections as is a critical service and this would impact other services. It seems that something weird is happening with dd-agent that is causing these errors into the logs.

/sorin

On 29 Sep 2014, at 17:09, Remi Hakim notifications@github.com wrote:

Hi @ssbarnea . Thanks for your feeback! Closing that issue as it's the expected behavior. Sometimes the connection to the database will fail but the agent should try to reset the connection if that's the case so monitoring will still work.

Please contact our support team if you have more issues.

— Reply to this email directly or view it on GitHub.

remh commented 10 years ago

Sorry I typed my answer too quickly the other day. This is a duplicate of that issue: https://github.com/DataDog/dd-agent/issues/1003 which will be fixed with agent 5.1.0.