flyerhzm / redis-sentinel

another redis automatic master/slave failover solution for ruby by using built-in redis sentinel (deprecated)
MIT License
188 stars 67 forks source link

Randomize sentinel order per-run, so that not all redis-sentinel instances use the exact same sentinel. #23

Closed rbroemeling closed 11 years ago

rbroemeling commented 11 years ago

The way that redis-sentinel uses the sentinels array, it is guaranteed that all instances of it will use the same sentinel at all times. This is less than ideal -- preferably, if there are N sentinels alive then we want our clients to spread out over them to mitigate the impact of the loss of a sentinel.

If we randomize the sentinel array ordering during redis-sentinel initialization, that should have the desired effect.

ondrejbartas commented 11 years ago

@rbroemeling do you know when are sentinel used? Only when calling for first time of redis instance. Then sentinel do lookup for master and connect user to this redis. If master is down, then this gem asks already connected sentinel for new master, if redis-sentinel is down too, it will go through array of all sentinels and ask them for new master.

I already tried this: 5 redis server (1 of them is master) + 5 sentinels

10 web servers (all have same settings for redis-sentinels - they know ip/port of all sentinels).

I shutdown 4 redis servers including their sentinels. (In config of sentinel I set that there should be at least only 1 sentinel to decide new master) and all 10 web servers connected to new master without any problem nor delay.

Is this enough for you? You don't have to care about which sentinel you are using. You are using them inly when something goes wrong.

Ondrej

rbroemeling commented 11 years ago

Hi Ondrej,

Yes, I understand the conditions under which sentinels are used. Even given that, I do not see any reason to limit the number of sentinels that the gem will talk to.

You are correct that when nothing goes wrong, having everything talking to a single sentinel has little impact.

My point is that when something does go wrong, in either a foreseen or unforeseen way, having everything talking to a single sentinel introduces an unnecessary single-point of failure to the system.

  1. If the sentinel that everyone is talking to goes offline, everyone has to time-out and then find the next sentinel to speak to. If only a fraction of the hosts are talking to that sentinel, only that fraction has to time-out and then find the next sentinel to speak to. The second situation is obviously preferable to the first.
  2. If there is a bug in the HA/fail-over code (as the one that we just encountered last week), it is far preferable that only the fraction of hosts connected to the affected sentinel go offline; rather than all of them. Again, this is plainly preferable to having all of the hosts connected to the affected sentinel.

Frankly, I see no advantage to enforcing that all instances of redis-sentinel connect to the same sentinel. There are only disadvantages, and the PR to shuffle the list of sentinels during initialization is, to my mind, a simple and straightforward fix of the situation.

flyerhzm commented 11 years ago

@rbroemeling it makes sense that different clients can talk to different sentinel server when initialize to avoid single point failure, good to know, thanks

rbroemeling commented 11 years ago

Great, thanks @flyerhzm .