We build our docker and swarm cluster based on etcd.
Those days we encountered a problem, our etcd cluster crushed. Then When we try to recover the etcd cluster in another node, docker and swarm cann't connected to etcd again before we restart them.
After I checked the code, it's because the etcd member list will not be updated anymore if it exists an error when sync the member list. I think ignoring this sync error may be a better option.
We build our docker and swarm cluster based on etcd.
Those days we encountered a problem, our etcd cluster crushed. Then When we try to recover the etcd cluster in another node, docker and swarm cann't connected to etcd again before we restart them.
After I checked the code, it's because the etcd member list will not be updated anymore if it exists an error when sync the member list. I think ignoring this sync error may be a better option.
Here is the link of sync code: https://github.com/docker/libkv/blob/master/store/etcd/etcd.go#L94