CodisLabs / codis

Proxy based Redis cluster solution supporting pipeline and scaling dynamically
MIT License
13.17k stars 2.69k forks source link

codis增加group后,auto rebalance无法使用 #493

Closed skymysky closed 9 years ago

skymysky commented 9 years ago

增加一个新的group后,dashboard上点击auto rebalance无反应。怎么回事,之前还会有一个slot200 error报错,我把zk里面的tasts删除掉了。

skymysky commented 9 years ago

Migrate Slot New Group Create At Status Percent slot_200 group_7 2015-10-22 15:41:39 +0800 error 0 %

yangzhe1991 commented 9 years ago

看看dashboard的log?

skymysky commented 9 years ago

2015/10/23 10:51:50 dashboard.go:160: [INFO] dashboard listening on addr: :18087 2015/10/23 10:51:50 dashboard.go:143: [INFO] dashboard node created: /zk/codis/db_codis_test/dashboard, {"addr": "20.26.17.203: 18087", "pid": 11988} 2015/10/23 10:51:50 dashboard.go:144: [WARN] \ Attention ** 2015/10/23 10:51:50 dashboard.go:145: [WARN] You should use kill {pid} rather than kill -9 {pid} to stop me, 2015/10/23 10:51:50 dashboard.go:146: [WARN] or the node resisted on zk will not be cleaned when I'm quiting and you must remov e it manually 2015/10/23 10:51:50 dashboard.go:147: [WARN] *** 2015/10/23 10:51:59 dashboard_apis.go:88: [ERROR] get server groups failed [error]: zk: node does not exist 1 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/server_group.go:110 github.com/wandoulabs/codis/pkg/models.ServerGroups 0 /usr/local/codis/src/github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:86 main.apiGetServerGroupList ... ... [stack]: 0 /usr/local/codis/src/github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:88 main.apiGetServerGroupList ... ... 2015/10/23 10:52:09 dashboard_apis.go:88: [ERROR] get server groups failed [error]: zk: node does not exist 1 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/server_group.go:110 github.com/wandoulabs/codis/pkg/models.ServerGroups 0 /usr/local/codis/src/github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:86 main.apiGetServerGroupList ... ... [stack]: 0 /usr/local/codis/src/github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:88 main.apiGetServerGroupList ... ... 2015/10/23 10:52:26 dashboard_apis.go:403: [ERROR] set proxy states failed: {Id:codis_proxy_1 Addr: LastEvent: LastEventTs:0 St ate:online Description: DebugVarAddr: Pid:0 StartAt:} [error]: zk: node does not exist 2 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/slot.go:114 github.com/wandoulabs/codis/pkg/models.Slots 1 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/proxy.go:158 github.com/wandoulabs/codis/pkg/models.SetProxyStatus 0 /usr/local/codis/src/github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:397 main.apiSetProxyStatus ... ... [stack]: 0 /usr/local/codis/src/github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:403 main.apiSetProxyStatus ... ... 2015/10/23 10:52:32 dashboard_apis.go:88: [ERROR] get server groups failed [error]: zk: node does not exist 1 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/server_group.go:110 --More--(45%)

yangzhe1991 commented 9 years ago

各种zk信息不存在,是因为你删掉了吗?

2015年10月23日星期五,skymysky notifications@github.com 写道:

2015/10/23 10:51:50 dashboard.go:160: [INFO] dashboard listening on addr: :18087 2015/10/23 10:51:50 dashboard.go:143: [INFO] dashboard node created: /zk/codis/db_codis_test/dashboard, {"addr": "20.26.17.203: 18087", "pid": 11988} 2015/10/23 10:51:50 dashboard.go:144: [WARN] ****** Attention


2015/10/23 10:51:50 dashboard.go:145: [WARN] You should use kill {pid} rather than kill -9 {pid} to stop me, 2015/10/23 10:51:50 dashboard.go:146: [WARN] or the node resisted on zk will not be cleaned when I'm quiting and you must remov e it manually 2015/10/23 10:51:50 dashboard.go:147: [WARN]


2015/10/23 10:51:59 dashboard_apis.go:88: [ERROR] get server groups failed [error]: zk: node does not exist 1 /usr/local/codis/src/ github.com/wandoulabs/codis/pkg/models/server_group.go:110 github.com/wandoulabs/codis/pkg/models.ServerGroups 0 /usr/local/codis/src/ github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:86 main.apiGetServerGroupList ... ... [stack]: 0 /usr/local/codis/src/ github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:88 main.apiGetServerGroupList ... ... 2015/10/23 10:52:09 dashboard_apis.go:88: [ERROR] get server groups failed [error]: zk: node does not exist 1 /usr/local/codis/src/ github.com/wandoulabs/codis/pkg/models/server_group.go:110 github.com/wandoulabs/codis/pkg/models.ServerGroups 0 /usr/local/codis/src/ github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:86 main.apiGetServerGroupList ... ... [stack]: 0 /usr/local/codis/src/ github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:88 main.apiGetServerGroupList ... ... 2015/10/23 10:52:26 dashboard_apis.go:403: [ERROR] set proxy states failed: {Id:codis_proxy_1 Addr: LastEvent: LastEventTs:0 St ate:online Description: DebugVarAddr: Pid:0 StartAt:} [error]: zk: node does not exist 2 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/slot.go:114 github.com/wandoulabs/codis/pkg/models.Slots 1 /usr/local/codis/src/github.com/wandoulabs/codis/pkg/models/proxy.go:158 github.com/wandoulabs/codis/pkg/models.SetProxyStatus 0 /usr/local/codis/src/ github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:397 main.apiSetProxyStatus ... ... [stack]: 0 /usr/local/codis/src/ github.com/wandoulabs/codis/cmd/cconfig/dashboard_apis.go:403 main.apiSetProxyStatus ... ... 2015/10/23 10:52:32 dashboard_apis.go:88: [ERROR] get server groups failed [error]: zk: node does not exist 1 /usr/local/codis/src/ github.com/wandoulabs/codis/pkg/models/server_group.go:110 --More--(45%)

— Reply to this email directly or view it on GitHub https://github.com/wandoulabs/codis/issues/493#issuecomment-150429355.

Thanks, Phil Yang

skymysky commented 9 years ago

这个删掉了,这个创建的信息写在那里吗? 我看dashboard启动的时候会创建一个这个吧。

yangzhe1991 commented 9 years ago

不会。。存集群中有哪些redis,分别负责哪些slot的信息都是持久化在zk上,删了当然就找不到了。。

2015年10月23日星期五,skymysky notifications@github.com 写道:

这个删掉了,这个创建的信息写在那里吗? 我看dashboard启动的时候会创建一个这个吧。

— Reply to this email directly or view it on GitHub https://github.com/wandoulabs/codis/issues/493#issuecomment-150433274.

Thanks, Phil Yang

skymysky commented 9 years ago

slot{ "product_name":"codis_test", "id":240, "group_id"-1, "state":{ "status":offline", "migrate_status":{ "from":-1, "to":-1 }, "last_op_ts":"0" } } is not online or migrate

skymysky commented 9 years ago

proxy状态设置为online的时候报错。

yangzhe1991 commented 9 years ago

"group_id"-1, 说明你还没给slot分配group。 见 https://github.com/wandoulabs/codis/blob/master/doc/tutorial_zh.md#流程

2015-10-23 11:56 GMT+08:00 skymysky notifications@github.com:

proxy状态设置为online的时候报错。

— Reply to this email directly or view it on GitHub https://github.com/wandoulabs/codis/issues/493#issuecomment-150457014.

Thanks, Phil Yang

skymysky commented 9 years ago

这个操作有做过了。。还是报这个错误。

yangzhe1991 commented 9 years ago

那说明这个操作没做成功或者有遗漏……

skymysky commented 9 years ago

好的,我在从头来过一次试试。

yangzhe1991 commented 9 years ago

不用从头来,dashboard上能看每个slot分在哪个group,应该会有一些没颜色的,单独处理下那些就行

2015-10-23 12:02 GMT+08:00 skymysky notifications@github.com:

好的,我在从头来过一次试试。

— Reply to this email directly or view it on GitHub https://github.com/wandoulabs/codis/issues/493#issuecomment-150458794.

Thanks, Phil Yang

skymysky commented 9 years ago

1024个slot一定要全部分配吗。我建了5个group。只分配了500个slot。。也没发现报错信息。。以后还想增加group的。。发现auto rebanlace用不了。一直报一个slot error错误。

skymysky commented 9 years ago

还是之前的报错信息,我重新操作了一遍。zk都删除了,全部重新部署的。proxy 置换online的时候还是报同样的错误,你说的单独处理下,怎么处理。。是有一部分没颜色的,是off状态。

skymysky commented 9 years ago

Slot ID 300 Product codis_test Server Grpup group4 Keys 0 Status online Migrate From group-1 Migrate To group_-1 Last Op Ts Thu Jan 01 1970 08:00:00 GMT+0800 (中国标准时间)

skymysky commented 9 years ago

这算什么回事,就是你说的那个slot没有分配给group吗。。

yangzhe1991 commented 9 years ago

你对slot的理解有问题,请咨询阅读https://github.com/wandoulabs/codis/blob/master/doc/tutorial_zh.md#流程 第4小节

skymysky commented 9 years ago

怎么理解,是1024个slot,一定要全部分配到创建的group组里面吗?

yangzhe1991 commented 9 years ago

Codis 采用 Pre-sharding 的技术来实现数据的分片, 默认分成 1024 个 slots (0-1023), 对于每个key来说, 通过以下公式确定所属的 Slot Id : SlotId = crc32(key) % 1024 每一个 slot 都会有一个且必须有一个特定的 server group id 来表示这个 slot 的数据由哪个 server group 来提供.

skymysky commented 9 years ago

也就是说,我之前分了5个组,还有一些slot是off状态的是不对的啦。

skymysky commented 9 years ago

这些off状态的slot怎么处理。

yangzhe1991 commented 9 years ago

set到任意一个group

2015-10-23 14:07 GMT+08:00 skymysky notifications@github.com:

这些off状态的slot怎么处理。

— Reply to this email directly or view it on GitHub https://github.com/wandoulabs/codis/issues/493#issuecomment-150485711.

Thanks, Phil Yang

skymysky commented 9 years ago

命令操作方便说下吗?

yangzhe1991 commented 9 years ago

就是教程里配置,和你前500个是一样的分法。。。

2015-10-23 14:12 GMT+08:00 skymysky notifications@github.com:

命令操作方便说下吗?

— Reply to this email directly or view it on GitHub https://github.com/wandoulabs/codis/issues/493#issuecomment-150486133.

Thanks, Phil Yang

skymysky commented 9 years ago

修改了下试过,不行。

yangzhe1991 commented 9 years ago

怎么试的?又是怎么不行的?