alibaba / otter

阿里巴巴分布式数据库同步系统(解决中美异地机房)
Apache License 2.0
8.08k stars 2.49k forks source link

操作zookeeper超时,导致整个机房的同步任务停止 #819

Open yanghongkjxy opened 5 years ago

yanghongkjxy commented 5 years ago

1.跨机房同步,有一个异城机房访问主机房的zk超时,导致到这个机房的任务全部停止,并且这个机房的node全部假死

栈信息里有看到操作zk在等待

"Load-Rpc-Async-0" #488 daemon prio=5 os_prio=0 tid=0x00007f355401a000 nid=0x3073 in Object.wait() [0x00007f33c9156000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

yanghongkjxy commented 5 years ago

@longdafeng

yanghongkjxy commented 5 years ago

停止前有一些处理超时的错误 image

agapple commented 5 years ago

otter对于zookeeper是一个强依赖