adubkov / zbx_redis_template

Zabbix template for Redis
129 stars 102 forks source link

Redis 3.0 and Zabbix monitoring #9

Open msims-okta opened 9 years ago

msims-okta commented 9 years ago

We're using the recently released Redis 3.0 with clustering enabled.

I have Zabbix monitoring configured via cron, pushing data to our Zabbix server.

Every so often a key the zabbix python script sends to the localhost redis node errors:

#  /etc/zabbix/zabbix_agentd.d/zbx_redis_stats.py localhost -p 6379

Traceback (most recent call last):
  File "/etc/zabbix/zabbix_agentd.d/zbx_redis_stats.py", line 145, in <module>
    main()
  File "/etc/zabbix/zabbix_agentd.d/zbx_redis_stats.py", line 137, in main
    if client.type(key) == 'list':
  File "/usr/lib/python2.6/site-packages/redis/client.py", line 1112, in type
    return self.execute_command('TYPE', name)
  File "/usr/lib/python2.6/site-packages/redis/client.py", line 565, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/lib/python2.6/site-packages/redis/client.py", line 577, in parse_response
    response = connection.read_response()
  File "/usr/lib/python2.6/site-packages/redis/connection.py", line 574, in read_response
    raise response
redis.exceptions.ResponseError: MOVED 8833 10.139.103.247:6379

This is on a cluster slave. The IP is the cluster master.

msims-okta commented 9 years ago

To fix this, I had to perform a FLUSHDB on the cluster master. This is less than ideal.

I'll look through the code to find what keys this Zabbix python script is using and see if I can narrow it down.

msims-okta commented 9 years ago

After some addition use, it error occurs when a cluster slave failovers as the new master. The error then occurs on both slaves. Even after performing a 'cluster failover' back to the original master, the two slaves continue to produce this error while the master is fine.

We are no longer able to monitor the clustered slaves as no new data is able to make it back to the Zabbix server.

msims-okta commented 9 years ago

OK I may have found the culprit.

The client.keys(*) appears to be the issue. In a clustered state (sharding) keys can exist on another node.

I commented out the following:

134         #keys = client.keys('*')
135         #llensum = 0
136         #for key in keys:
137         #    if client.type(key) == 'list':
138         #        llensum += client.llen(key)
139         #a.append(Metric(redis_hostname, 'redis[llenall]', llensum))

And I'm no longer getting these errors, and data is making its way back to the zabbix server.