Open tdterry opened 7 years ago
i also have this problem, docker swarm init --force-new-cluster
throws and error and leaves the cluster in an unusable state
this is why docker swarm is never to be taken seriously if you use it in production, you have yourself to blame!!!
For a single node cluster. After restarting the node, everything is broken,
docker swarm init --force-new-cluster
will throw an error saying address in use
Expected behavior
I was trying to move nodes from one swarm cluster to another, and I ended up with a broken manager quorum. To fix this, I did
swarm init --force-new-cluster
on one of the managers. I expected this to create a new cluster with the existing database that I could then rejoin.Actual behavior
When I executed the init, I got an error. After that, the swarm is completely broken.
Steps to reproduce the behavior
This is all rather complicated, and I am not entirely sure how I it happened. I tried to reproduce using a smaller test with fresh swarms and only a single node in each, but I wasn't able to exhibit the error. Below are my original steps.
I started with two swarms, and I was trying to merge them together. Node names have been shorted for readability.
Swarm A has 3 nodes, all managers (node1, node2, node3).
Swarm B has 5 nodes. node4 is a manager, node5 to node8 are workers. My plan was to join nodes 1-3 to the second swarm.
I attempted to join node2 to Swarm B, but I forgot to demote and remove it from Swarm A first.
At this point, Swarm B shows two managers, but one is unreachable.
Presumably,
9ev2v9qpbmrkbd5t5vt9acgok
is the failed node2. I tried a few more join commands on node2, restarted it, etc. Nothing changed.Swarm B is broken because it is missing one of two managers, so I tried to recover by reinitializing the swarm, and that failed.
And now Swarm B is more broken, because it can't even list nodes anymore.
Output of
docker version
:Node 4 (Swarm B)
Node 2 (Swarm A)
Output of
docker info
:Node 4 (Swarm B) Note: swarm still thinks there are two managers. The IP addresses are node4 and node2.
Node 2 (Swarm A)
Additional environment details (AWS, VirtualBox, physical, etc.) My local machine is a Mac Docker CE 17.06.0-ce. My remote hosts are EC2 instances running Docker CE 17.03.1-ce.