CodisLabs / codis

Proxy based Redis cluster solution supporting pipeline and scaling dynamically
MIT License
13.14k stars 2.69k forks source link

关于server主从切换,proxy 的问题 #1506

Open jingjingxin opened 6 years ago

jingjingxin commented 6 years ago

version = 2018-04-07 12:12:01 +0800 @4803cffb121d21529c1717dddd2e75e3fab06ab3 @3.2.2-6-g4803cff compile = 2018-05-30 09:40:17 +0800 by go version go1.9.2 linux/amd64 环境:172.xx.xx.185上两主,172.xx.xx.184上是172.xx.xx.185的两从 把172.xx.xx.185上两主 直接kill了, 172.xx.xx.184上的从是能正常变主, 但是codis-proxy (172.xx.xx.183)日志一直打印 2018/06/08 10:30:13 backend.go:334: [WARN] backend conn [0xc43389bda0] to 172.xx.xx.185:11194, db-0 writer-[0] exit [error]: backend conn failure, set tcp 172.xx.xx.183:46510: use of closed network connection 2018/06/08 10:30:13 backend.go:261: [WARN] backend conn [0xc43389bda0] to 172.xx.xx.185:11194, db-0 round-[1] 2018/06/08 10:30:13 backend.go:282: [WARN] backend conn [0xc43389bda0] to 172.xx.xx.185:11194, db-0 reader-[0] exit [error]: backend conn failure, read tcp 172.xx.xx.183:46510->172.xx.xx.185:11194: read: connection reset by peer 2018/06/08 10:30:13 backend.go:334: [WARN] backend conn [0xc43389bda0] to 172.xx.xx.185:11194, db-0 writer-[1] exit [error]: dial tcp 172.xx.xx.185:11194: getsockopt: connection refused 3 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/redis/conn.go:30 github.com/CodisLabs/codis/pkg/proxy/redis.DialTimeout 2 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:158 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).newBackendReader 1 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:337 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).loopWriter 0 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:263 github.com/CodisLabs/codis/pkg/proxy.(*BackendConn).run ... ...

weilinqwe commented 6 years ago

不会啊。sentinel把从变成主后,会自动刷新给proxy,proxy切换连接

spinlock commented 6 years ago

对的,proxy 仅仅是响应 sentinel 的操作,是不是没有 sentinel 没有配对?

jingjingxin commented 6 years ago

sentinel配置,共有三个,帮瞅瞅 daemonize yes bind 172.xx.xx.184 protected-mode yes loglevel notice logfile "/data/services/codis/log/redis-sentinel.log" 下面是自动生成的

Generated by CONFIG REWRITE

sentinel myid 729e4adec742a0d4a022ec20e87a0b7790e1bf0f sentinel monitor xc-codis-manager-4 172.xx.xx.187 14193 2

sentinel failover-timeout xc-codis-manager-4 300000 sentinel auth-pass xc-codis-manager-4 xcTe7 sentinel config-epoch xc-codis-manager-4 0 sentinel leader-epoch xc-codis-manager-4 0 sentinel known-slave xc-codis-manager-4 172.xx.xx.186 14194 sentinel known-sentinel xc-codis-manager-4 172.xx.xx.185 26379 91048ff3f6301afaf1d2e30917def74d9362e1df sentinel known-sentinel xc-codis-manager-4 172.xx.xx.186 26379 5578bac70f8b32792fa4b20dc353b96135074f69 sentinel monitor xc-codis-manager-2 172.xx.xx.185 12193 2 sentinel failover-timeout xc-codis-manager-2 300000 sentinel auth-pass xc-codis-manager-2 xcTe7 sentinel config-epoch xc-codis-manager-2 0 sentinel leader-epoch xc-codis-manager-2 0 sentinel known-slave xc-codis-manager-2 172.xx.xx.184 12194 sentinel known-sentinel xc-codis-manager-2 172.xx.xx.185 26379 91048ff3f6301afaf1d2e30917def74d9362e1df sentinel known-sentinel xc-codis-manager-2 172.xx.xx.186 26379 5578bac70f8b32792fa4b20dc353b96135074f69 sentinel monitor xc-codis-manager-3 172.xx.xx.186 13193 2 sentinel failover-timeout xc-codis-manager-3 300000 sentinel auth-pass xc-codis-manager-3 xcTe7 sentinel config-epoch xc-codis-manager-3 0 sentinel leader-epoch xc-codis-manager-3 0 sentinel known-slave xc-codis-manager-3 172.xx.xx.187 13194 sentinel known-sentinel xc-codis-manager-3 172.xx.xx.185 26379 91048ff3f6301afaf1d2e30917def74d9362e1df sentinel known-sentinel xc-codis-manager-3 172.xx.xx.186 26379 5578bac70f8b32792fa4b20dc353b96135074f69 sentinel monitor xc-codis-manager-1 172.xx.xx.185 11194 2 sentinel failover-timeout xc-codis-manager-1 300000 sentinel auth-pass xc-codis-manager-1 xcTe7 sentinel config-epoch xc-codis-manager-1 7 sentinel leader-epoch xc-codis-manager-1 7 sentinel known-slave xc-codis-manager-1 172.xx.xx.184 11193 sentinel known-sentinel xc-codis-manager-1 172.xx.xx.185 26379 91048ff3f6301afaf1d2e30917def74d9362e1df sentinel known-sentinel xc-codis-manager-1 172.xx.xx.186 26379 5578bac70f8b32792fa4b20dc353b96135074f69 sentinel current-epoch 7 ~

weilinqwe commented 6 years ago

能完成主从切换,说明sentinel没问题的。要看看dashboard有没有同步sentinel,dashboard界面上三个sentinel节点处于同步状态吗?另外dashboard上proxy前面的s按钮看看,proxy统计数据里的sentinel和master对不对,完成切换后有没有变?

jingjingxin commented 6 years ago

切换过程:我把172.xx.xx.185:12193 down掉,172.xx.xx.184:12194 变为主,proxy和sentinel信息显示都吻合, 切换前的S按钮信息 proxy { "online": true, "closed": false, "sentinels": { "servers": [ "172.xx.xx.184:26379", "172.xx.xx.185:26379", "172.xx.xx.186:26379" ], "masters": { "1": "172.xx.xx.184:11193", "2": "172.xx.xx.185:12193", "3": "172.xx.xx.186:13193", "4": "172.xx.xx.187:14193" } },

sentienl { ... "master0": "name=xc-codis-manager-1,status=ok,address=172.xx.xx.184:11193,slaves=1,sentinels=3", "master1": "name=xc-codis-manager-2,status=ok,address=172.xx.xx.185:12193,slaves=1,sentinels=3", "master2": "name=xc-codis-manager-4,status=ok,address=172.xx.xx.187:14193,slaves=1,sentinels=3", "master3": "name=xc-codis-manager-3,status=ok,address=172.xx.xx.186:13193,slaves=1,sentinels=3", ... "redis_git_sha1": "4803cffb", "redis_mode": "sentinel", "redis_version": "3.2.11", "rejected_connections": "0", "run_id": "1c59abace94d5a48103efa3d0b05549cbd011abc", "sentinel_masters": "4", ... } 切换后的S按钮信息 proxy { "online": true, "closed": false, "sentinels": { "servers": [ "172.xx.xx.184:26379", "172.xx.xx.185:26379", "172.xx.xx.186:26379" ], "masters": { "1": "172.xx.xx.184:11193", "2": "172.xx.xx.184:12194", "3": "172.xx.xx.186:13193", "4": "172.xx.xx.187:14193" }, "switched": true }, sentienl { ... "lru_clock": "1708192", "master0": "name=xc-codis-manager-1,status=ok,address=172.xx.xx.184:11193,slaves=1,sentinels=3", "master1": "name=xc-codis-manager-2,status=ok,address=172.xx.xx.184:12194,slaves=1,sentinels=3", "master2": "name=xc-codis-manager-4,status=ok,address=172.xx.xx.187:14193,slaves=1,sentinels=3", "master3": "name=xc-codis-manager-3,status=ok,address=172.xx.xx.186:13193,slaves=1,sentinels=3", ... "redis_mode": "sentinel", "redis_version": "3.2.11", "rejected_connections": "0", "run_id": "1c59abace94d5a48103efa3d0b05549cbd011abc", "sentinel_masters": "4", ... }

sentinel 的sync的按钮正常,最右边提示 [+]group=2,server=172.16.39.185:12193,runid=NA down掉的所在机器rsync为红色 out of sync,两个proxy的proxy,admin,rsync都为红色,

weilinqwe commented 6 years ago

那应该没问题啊,proxy的数据不是都刷新了,怎么没重连呢?@spinlock 还是作者解答一下。 另外,proxy的rsync红色是没问题的。

jingjingxin commented 6 years ago

经过反复测试,比如同时down 一个sentinel和一个主,或者一个sentinel和两个主,一个zookeeper等等,都是这样,但是,之前忘了172.xx.xx.184上zookeeper没启,(zookeeper集群共三台),后来把zookeeper起来,把一个主down掉,莫名奇妙就好了,后来再做以上测试又不行了,无助了 以下是proxy 代理日志,日志显示过长,已做部分省略处理,日志里好像有连接新的主172.xx.xx.185:11194,但是刚上正道,又跑偏了 2018/06/08 15:23:50 backend.go:334: [WARN] backend conn [0xc43452af60] to 172.xx.xx.184:11193, db-11 writer-[20] exit [error]: dial tcp 172.xx.xx.184:11193: getsockopt: connection refused 3 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/redis/conn.go:30 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).newBackendReader 1 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:337 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).loopWriter 0 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:263 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).run ... ... 2018/06/08 15:23:54 sentinel.go:67: [WARN] sentinel-[172.xx.xx.187:26379] subscribe event [message +switch-master xcbb-codis-manager-1 172.xx.xx.184 11193 172.xx.xx.185 11 194] 2018/06/08 15:23:54 sentinel.go:67: [WARN] sentinel subscribe notified +switch-master 2018/06/08 15:23:54 backend.go:258: [WARN] backend conn [0xc431c91c80] to 172.xx.xx.185:11194, db-1 start service 2018/06/08 15:23:54 backend.go:258: [WARN] backend conn [0xc431ce60c0] to 172.xx.xx.185:11194, db-5 start service 2018/06/08 15:23:54 backend.go:258: [WARN] backend conn [0xc431ce6900] to 172.xx.xx.185:11194, db-13 start service 2018/06/08 15:23:54 backend.go:261: [WARN] backend conn [0xc431c91c80] to 172.xx.xx.185:11194, db-1 round-[0] 2018/06/08 15:23:54 backend.go:258: [WARN] backend conn [0xc431ce6600] to 172.xx.xx.185:11194, db-10 start service 2018/06/08 15:23:54 router.go:225: [WARN] fill slot 0035, backend.addr = 172.xx.xx.185:11194, locked = false, +switched 2018/06/08 15:23:54 router.go:225: [WARN] fill slot 0036, backend.addr = 172.xx.xx.185:11194, locked = false, +switched 2018/06/08 15:23:54 backend.go:258: [WARN] backend conn [0xc431c91ec0] to 172.xx.xx.185:11194, db-3 start service 2018/06/08 15:23:54 backend.go:261: [WARN] backend conn [0xc431c91ec0] to 172.xx.xx.185:11194, db-3 round-[0] 2018/06/08 15:23:54 backend.go:258: [WARN] backend conn [0xc431ce61e0] to 172.xx.xx.185:11194, db-6 start service 2018/06/08 15:23:54 backend.go:261: [WARN] backend conn [0xc431c91f80] to 172.xx.xx.185:11194, db-4 round-[0] 2018/06/08 15:23:54 router.go:225: [WARN] fill slot 0043, backend.addr = 172.xx.xx.185:11194, locked = false, +switched 2018/06/08 15:23:54 router.go:225: [WARN] fill slot 0253, backend.addr = 172.xx.xx.185:11194, locked = false, +switched 2018/06/08 15:23:54 router.go:225: [WARN] fill slot 0254, backend.addr = 172.xx.xx.185:11194, locked = false, +switched 2018/06/08 15:23:54 backend.go:267: [WARN] backend conn [0xc4327bbf20] to 172.xx.xx.184:11193, db-0 stop and exit 2018/06/08 15:23:54 router.go:225: [WARN] fill slot 0255, backend.addr = 172.xx.xx.185:11194, locked = false, +switched 2018/06/08 15:23:54 backend.go:267: [WARN] backend conn [0xc43452a8a0] to 172.xx.xx.184:11193, db-5 stop and exit 2018/06/08 15:23:54 backend.go:267: [WARN] backend conn [0xc43452b0e0] to 172.xx.xx.184:11193, db-13 stop and exit 2018/06/08 15:23:54 backend.go:267: [WARN] backend conn [0xc43452aba0] to 172.xx.xx.184:11193, db-7 stop and exit 2018/06/08 15:23:54 backend.go:267: [WARN] backend conn [0xc43452ad20] to 172.xx.xx.184:11193, db-8 stop and exit 2018/06/08 15:23:55 backend.go:261: [WARN] backend conn [0xc437257b00] to 172.xx.xx.184:11193, db-5 round-[218] 2018/06/08 15:23:55 backend.go:261: [WARN] backend conn [0xc430ca6000] to 172.xx.xx.184:11193, db-11 round-[218] 2018/06/08 15:23:55 backend.go:334: [WARN] backend conn [0xc437257b00] to 172.xx.xx.184:11193, db-5 writer-[218] exit [error]: dial tcp 172.xx.xx.184:11193: getsockopt: connection refused 3 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/redis/conn.go:30 2 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:158 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).newBackendReader 1 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:337 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).loopWriter 0 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:263 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).run ... ... 2018/06/08 15:23:55 backend.go:334: [WARN] backend conn [0xc437257e60] to 172.xx.xx.184:11193, db-9 writer-[218] exit [error]: dial tcp 172.xx.xx.184:11193: getsockopt: connection refused 3 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/redis/conn.go:30 github.com/CodisLabs/codis/pkg/proxy/redis.DialTimeout 2 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:158 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).newBackendReader 1 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:337 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).loopWriter 0 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:263 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).run ... ... ... ... 2018/06/08 15:23:59 backend.go:334: [WARN] backend conn [0xc430ca6420] to 172.xx.xx.185:11194, db-0 writer-[385] exit [error]: backend conn failure, set tcp 172.xx.xx.183:53398: use of closed network connection 2018/06/08 15:23:59 backend.go:261: [WARN] backend conn [0xc430ca6420] to 172.xx.xx.185:11194, db-0 round-[386] 2018/06/08 15:23:59 backend.go:334: [WARN] backend conn [0xc430ca64e0] to 172.xx.xx.185:11194, db-1 writer-[353] exit 2018/06/08 15:23:59 backend.go:334: [WARN] backend conn [0xc430ca6960] to 172.xx.xx.185:11194, db-5 writer-[353] exit [error]: backend conn failure, set tcp 172.xx.xx.183:53376: use of closed network connection 2018/06/08 15:23:59 backend.go:261: [WARN] backend conn [0xc430ca6960] to 172.xx.xx.185:11194, db-5 round-[354] 2018/06/08 15:23:59 backend.go:334: [WARN] backend conn [0xc430ca6a20] to 172.xx.xx.185:11194, db-6 writer-[353] exit [error]: backend conn failure, set tcp 172.xx.xx.183:53400: use of closed network connection 2018/06/08 15:24:00 backend.go:261: [WARN] backend conn [0xc437257b00] to 172.xx.xx.184:11193, db-5 round-[219] 2018/06/08 15:24:00 backend.go:261: [WARN] backend conn [0xc437257e60] to 172.xx.xx.184:11193, db-9 round-[219] 2018/06/08 15:24:00 backend.go:334: [WARN] backend conn [0xc430ca6000] to 172.xx.xx.184:11193, db-11 writer-[219] exit [error]: dial tcp 172.xx.xx.184:11193: getsockopt: connection refused 3 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/redis/conn.go:30 github.com/CodisLabs/codis/pkg/proxy/redis.DialTimeout 2 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:158 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).newBackendReader 1 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:337 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).loopWriter 0 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:263 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).run ... ... 2018/06/08 15:24:00 backend.go:261: [WARN] backend conn [0xc430ca60c0] to 172.xx.xx.184:11193, db-12 round-[219] 2018/06/08 15:24:00 backend.go:261: [WARN] backend conn [0xc430ca62a0] to 172.xx.xx.184:11193, db-14 round-[219] 2018/06/08 15:24:00 backend.go:261: [WARN] backend conn [0xc437257f20] to 172.xx.xx.184:11193, db-10 round-[219] 2018/06/08 15:24:00 backend.go:334: [WARN] backend conn [0xc430ca60c0] to 172.xx.xx.184:11193, db-12 writer-[219] exit [error]: dial tcp 172.xx.xx.184:11193: getsockopt: connection refused 3 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/redis/conn.go:30 github.com/CodisLabs/codis/pkg/proxy/redis.DialTimeout 2 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:158 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).loopWriter 0 /root/go/src/github.com/CodisLabs/codis/pkg/proxy/backend.go:263 github.com/CodisLabs/codis/pkg/proxy.(BackendConn).run ... ...