DataDog / dd-agent

Datadog Agent Version 5
https://docs.datadoghq.com/
Other
1.3k stars 814 forks source link

MySQL integration throws error with RDS Aurora Reader #2579

Open hybby opened 8 years ago

hybby commented 8 years ago

I've configured the MySQL Integration against two instances of an AWS RDS Aurora database, which is MySQL compatible. The first is a writer (master) and is reporting fine. The second is a reader (read-replica) is returning a Python stack trace when invoking dd-agent info:

# dd-agent info -v
===================
Collector (v 5.8.0)
===================

  Status date: 2016-06-08 04:45:20 (8s ago)
  Pid: 34816
  Platform: Linux-3.2.0-88-virtual-x86_64-with-debian-wheezy-sid
  Python Version: 2.7.11, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log, syslog:/dev/log
...snip...
      - instance #11 [OK]
      - instance #12 [ERROR]: "'NoneType' object has no attribute '__getitem__'"
        Traceback (most recent call last):
          File "/opt/datadog-agent/agent/checks/__init__.py", line 750, in run
            self.check(copy.deepcopy(instance))
          File "/opt/datadog-agent/agent/checks.d/mysql.py", line 314, in check
            raise e
        TypeError: 'NoneType' object has no attribute '__getitem__'
...snip...

In this case, instance #11 is the writer (master) and instance #12 is the reader (read-replica). I have other RDS instances configured up with this integration on this host, which is why the numbers are larger than you'd expect.

Configuration file /etc/dd-agent/conf.d/mysql.yaml looks like so, with site-specific items removed:

  - server: <omitted_host1>.rds.amazonaws.com
    user: <omitted>
    pass: <omitted>
    tags:
      - "mysql_host:<omitted_host1>.rds.amazonaws.com"
      - "purpose:<omitted>"
      - "project:<omitted>"
      - "environment:<omitted>"
    options: # Optional
      replication: true
      galera_cluster: 0
  - server: <omitted_host2>.rds.amazonaws.com
    user: <omitted>
    pass: <omitted>
    tags:
      - "mysql_host:<omitted_host2>.rds.amazonaws.com"
      - "purpose:<omitted>"
      - "project:<omitted>"
      - "environment:<omitted>"
    options: # Optional
      replication: true
      galera_cluster: 0

How can I troubleshoot this issue further?

As noted, the writer (master) instance seems to report as OK and send metrics just fine. It's the reader (read-replica) that's throwing this traceback.

Curious as to whether I should just configure this up pointing at the cluster endpoint rather than a configuration stanza for each node of the cluster.

degemer commented 8 years ago

Hi @hybby !

This RDS Aurora instance is probably missing a metric the check expects to be there, and it fails because of this. Could you enable debug logs and send a flare to support at datadoghq.com so that we can troubleshoot this further ?

creatorzim commented 8 years ago

I'm experiencing this issue as well, any resolution?

rabidscorpio commented 7 years ago

@degemer To add some more context to this, running SHOW /*!50000 ENGINE*/ INNODB STATUS \G on an Aurora writer gives the usual innodb status output:

mysql> SHOW /*!50000 ENGINE*/ INNODB STATUS \G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2016-09-23 04:59:38 2b10b7ec5700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 2 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2746859 srv_active, 0 srv_shutdown, 3179 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
.
.
.

While running the same command on an Aurora replica yeilds:

mysql> SHOW /*!50000 ENGINE*/ INNODB STATUS \G
Empty set (0.00 sec)

I know that we can disable innodb stats on replicas with version 5.7 or later but it seems like there should be a more graceful way to handle this problem.