Open ondrejbartas opened 11 years ago
Sounds good, but I have some concerns for writing to master and reading from slave, also redis is fast, there is still some delay between master and slave. e.g.
What do you think?
For me It should be as option to use only master or use nearest slave.
BTW. There is a slight problem with sentinels. I tried start two redis servers (master + 1 slave), then start redis sentinels (2 - one for every server) and then test it like - kill slave, start slave, kill master, wait to slave become master, start old master, kill new master - redis sentinel until now worked good, but when you kill one sentinel in time of switch slave into master everything goes down. It is because redis sentinel doesn't verify connection into sentinel and desn't do reconnection :( This should be fixed in the beginning
What do you think? :D
From the API point of view it could become a :read_preference option settable on redis client instance. When the list of available redis nodes with latency information is obtained from sentinel it would be rather trivial to implement the following read preferences:
As for the bug @ondrejbartas has encountered: could you create some scripts/example scenario to test what exactly is wrong? Some time ago I have tested many different fail-over scenarios and except the "too early master promotion" - already fixed bug there was no problem with redis_sentinel.
I will try to reproduce this error and write som script for it.
@royaltm your options (master_only, slave_only, slave_first, nearest) are good and they are covering all possible cases.
I will try to find how to determine which node is nearest or usable. For now I think about these strategies:
Maybe I am going in bad way. I am not sure about this... When I was thinking about it little bit more, it will became maybe overkill for most of use-cases
@royaltm bug issue - I found problem with my config. I set:
sentinel monitor first_server 127.0.0.1 6379 2
and used only two redis & sentinels. By documentation is last option: (level of agreement needed to detect this master as failing of 2 sentinels (if the agreement is not reached the automatic failover does not start).)
when I changed it to: sentinel monitor first_server 127.0.0.1 6379 1
It started to work and switch between slave to master was without problems
This is my test file
require 'redis'
require 'redis-sentinel'
def pid_file port
#for sure create tmop dir
Dir.mkdir("tmp") unless File.exists?("tmp")
"tmp/#{port}.pid"
end
def redis_running port
File.exists?(pid_file(port)) && Process.kill(0, File.read(pid_file(port)).to_i)
rescue Errno::ESRCH
FileUtils.rm pid_file(port)
false
end
def start_redis port, slave_of = nil
unless redis_running(port)
command = "redis-server redis-config.conf --port #{port} --pidfile #{pid_file(port)} --dbfilename tmp/#{port}.rdb"
command += " --slaveof #{slave_of}" if slave_of
system command
sleep(5) #fix for waiting to redis start to get pid
puts "redis started on port: #{port} with PID: #{File.read(pid_file(port)).to_i}"
else
puts "redis already running on port: #{port} and with pid: #{File.read(pid_file(port)).to_i}"
end
end
def start_sentinel port
sentinel_port = 10000+port
unless redis_running(sentinel_port)
#need to create config for sentinel (I couldn't find way to start sentinel with config from command line :( )
sentinel_conf_file = "tmp/sentinel_#{sentinel_port}.conf"
fw = File.open(sentinel_conf_file, "w:UTF-8")
fw.puts "pidfile #{pid_file(sentinel_port)}
daemonize yes
port #{sentinel_port}
sentinel monitor first_server 127.0.0.1 #{port} 1
sentinel down-after-milliseconds first_server 5000
sentinel failover-timeout first_server 9000
sentinel can-failover first_server yes
sentinel parallel-syncs first_server 1"
fw.close
command = "redis-server #{sentinel_conf_file} --sentinel "
system command
sleep(1)
puts "redis sentinel started on port: #{sentinel_port} with PID: #{File.read(pid_file(sentinel_port)).to_i}"
else
puts "redis sentinel already running on port: #{sentinel_port} and with pid: #{File.read(pid_file(sentinel_port)).to_i}"
end
end
def stop_redis port
if File.exists?(pid_file(port))
Process.kill "INT", File.read(pid_file(port)).to_i
puts "redis stopped on port: #{port} with PID:#{File.read(pid_file(port)).to_i}"
FileUtils.rm pid_file(port)
end
end
def start_redis_with_sentinel port, slave_of = nil
start_redis port, slave_of
start_sentinel port
end
puts "Stopping all redis"
stop_redis 13340
stop_redis 13341
puts "Stopping all sentinels"
stop_redis 23340
stop_redis 23341
start_redis_with_sentinel 13340
start_redis_with_sentinel 13341, "127.0.0.1 13340"
redis = Redis.new(:master_name => "first_server",
:sentinels => [
{:host => "localhost", :port => 23340},
{:host => "localhost", :port => 23341}
],
:failover_reconnect_timeout => 30,
:failover_reconnect_wait => 0.0001)
redis.set "foo", 1
count = 0
while true
if count == 30
puts "killing master redis & it's sentinel"
stop_redis 13340
stop_redis 23340
end
if count == 120
puts "starting again old master redis & sentinel"
start_redis_with_sentinel 13340
end
if count == 150
puts "killing current master redis & it's sentinel"
stop_redis 13341
stop_redis 23341
end
if count == 200
puts "starting slave redis & sentinel"
#using same config as before!
start_redis_with_sentinel 13341, "127.0.0.1 13340"
end
if count == 250
puts "killing master redis & it's sentinel"
stop_redis 13340
stop_redis 23340
end
begin
data = redis.incr "foo"
puts "current redis port #{redis.client.port} -> INCR: #{data}"
rescue Redis::CannotConnectError => e
puts "failover took too long to recover", e
end
count += 1
sleep 1
end
and my redis config:
daemonize yes
port 16379
bind 127.0.0.1
timeout 0
loglevel notice
logfile stdout
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dir ./
slave-serve-stale-data yes
slave-read-only yes
slave-priority 100
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
and output of script:
redis started on port: 13340 with PID: 13710
redis sentinel started on port: 23340 with PID: 13712
redis started on port: 13341 with PID: 13714
redis sentinel started on port: 23341 with PID: 13719
current redis port 13340 -> INCR: 2
.
.
current redis port 13340 -> INCR: 31
killing master redis & it's sentinel
redis stopped on port: 13340 with PID:13710
redis stopped on port: 23340 with PID:13712
trying nex sentinel!localhost:23341
current redis port 13341 -> INCR: 32
.
.
current redis port 13341 -> INCR: 121
starting again old master redis & sentinel
redis started on port: 13340 with PID: 13748
redis sentinel started on port: 23340 with PID: 13751
current redis port 13341 -> INCR: 122
.
.
current redis port 13341 -> INCR: 151
killing current master redis & it's sentinel
redis stopped on port: 13341 with PID:13714
redis stopped on port: 23341 with PID:13719
trying nex sentinel!localhost:23340
current redis port 13340 -> INCR: 152
.
.
current redis port 13340 -> INCR: 201
starting slave redis & sentinel
redis started on port: 13341 with PID: 13764
redis sentinel started on port: 23341 with PID: 13767
current redis port 13340 -> INCR: 202
.
.
current redis port 13340 -> INCR: 251
killing master redis & it's sentinel
redis stopped on port: 13340 with PID:13748
redis stopped on port: 23340 with PID:13751
trying nex sentinel!localhost:23341
current redis port 13341 -> INCR: 252
.
.
// , Ondrej, did you meet the requirement "Redis Sentinel knows all servers in cluster and it would be very nice to use connection for read commands to nearest (fastest ping) redis server"?
Hello,
I am working on redis backup and failover situation where I have 3 frontend server, every server has its own redis + fronted application.
1 redis server will be started as master, other as slaves.
Redis Sentinel knows all servers in cluster and it would be very nice to use connection for read commands to nearest (fastest ping) redis server (every server will ask its own redis server for read and write into only one master)
When one of slaves goes down, then application using this server will switch into another redis server (doesn't matter if it is slave or master)
When master goes down, all read commands will proceed without fail (slaves will be used), write command will be with errors till redis Sentinel will switch one of slaves into master mode. Then all write commands will be switched to this new master.
Currently all requests are going just into master (if I read code of redis-sentinel right) and all slaves are not in use :(
What do you think about this approach?
If you like it, I will try to extend redis-sentinel to use this scenario.
Ondrej Bartas