Open Hangdong-Zhang opened 7 years ago
I updated the error log, because the previous log was caused when we add HA proxy for redis-sentinel (I also attached it). I was in mistake for thinking they are same reason. Sorry!
Your URL does not include sentinel_fallback
so obviously it can't connect to the fail-over instance.
This is poorly documented unfortunately, you'll have to go through https://github.com/gnocchixyz/gnocchi/blob/master/gnocchi/common/redis.py :(
Adding doc tag as we need to update the doc for that.
@jd Thanks a lot!
With your help, we can achieve redis-sentinel HA by sentinel_fallback
option, so that we can avoid single "redis-sentinel" service failure without HA proxy.
And we also found and fixed the bug related with tooz in our site, So for, all of them work so good !
@Hangdong-Zhang great!
What bug did you fix in tooz?
We found someone incautiously commented the "coordination_url" option out in gnocchi.conf. So by default, tooz use redis (we always used memcache ), and raise error if redis switch master-slave (error log is same with the one in my 2nd comment).
The error disappeared when we recovered "coordination_url" option (use memcache).
Ok, so there might still a bug around that master slave that we need to test. Thanks @Hangdong-Zhang !
When I used the following config:
coordination_url = memcached://10.127.2.78:11211
10.127.2.78 is a vip, the request post to 10.127.2.78 is distributed to one of the following server(Descending priority in order): 10.127.2.121 (1st), 10.127.2.122 (2nd), 10.127.2.123 (3rd)
When the 10.127.2.121 is down, the 10.127.2.122 provide service for datastore
After a few seconds, we will see the error in gnocchi-metricd.log
2017-12-18 15:05:01,285 [15214] ERROR futurist.periodics: Failed to call periodic 'gnocchi.cli.run_watchers' (it runs every 30.00 seconds) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/futurist/periodics.py", line 290, in run work() File "/usr/lib/python2.7/site-packages/futurist/periodics.py", line 64, in call return self.callback(*self.args, self.kwargs) File "/usr/lib/python2.7/site-packages/futurist/periodics.py", line 178, in decorator return f(*args, *kwargs) File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 215, in run_watchers self.coord.run_watchers() File "/usr/lib/python2.7/site-packages/tooz/drivers/memcached.py", line 509, in run_watchers result = super(MemcachedDriver, self).run_watchers(timeout=timeout) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 763, in run_watchers MemberLeftGroup(group_id, member_id))) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 120, in run return list(map(lambda cb: cb(args, kwargs), self)) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 120, in
return list(map(lambda cb: cb(*args, **kwargs), self)) File "/usr/lib/python2.7/site-packages/tooz/partitioner.py", line 50, in _on_member_leave self.ring.remove_node(event.member_id) File "/usr/lib/python2.7/site-packages/tooz/hashring.py", line 92, in remove_node raise UnknownNode(node) UnknownNode: Unknown node '6ee6caad-c093-4990-8e28-6de6cc9355e5'
I think my question is similar to this bug
@qkxu Well in your case you're using 3 different memcached servers, that can't work at all.
Issue: We used redis as storage driver, redis nodes was configured to master-slave mode and managed by redis-sentinel for HA. The option "redis_url" in gnocchi.conf was set to redis-sentinel, so that the redis will automatically switch master-slave by redis-sentinel and without any change in gnocchi. But, after redis switch master-slave, we can always see the error "Failed to call periodic 'gnocchi.cli.run_watchers'" in gnocchi-metricd.log until restart the gnocchi-metricd.service.
Environment: Linux: CentOS 7.2 Gnocchi: 4.0 Redis: redis-3.2.3-1 Tooz: 1.57.0
Reproduce:
Log:
Error log.txt