docker-archive / for-aws

92 stars 26 forks source link

Swarm broken - "context deadline exceeded" #110

Open OmpahDev opened 7 years ago

OmpahDev commented 7 years ago

My swarm is completely broken, it happened randomly, and I can't get it to get healthy again.

When I first started noticing that my swarm wasn't working, it showed that one of my managers was down. From then on everything I tried (deploying anything, etc.) failed. Googling the problem told me that this was because my swarm had lost quorum and the way to fix it was to run docker swarm init --force-new-cluster on my leader. However, this command fails.. it hangs for a while and then says Error response from daemon: context deadline exceeded. I get the exact same error when I try to run a docker swarm leave --force as well.

How do I fix this and get my swarm back to healthy? I AM NOT going to delete the CloudFormation stack and re-create, that's WAY too much work, so any solution MUST not involve doing this, or replacing any of my AWS infrastructure in any way.

FrenchBen commented 7 years ago

Seems like this is similar to: https://github.com/docker/swarmkit/issues/1340

danieljuhl commented 6 years ago

I had a similar issue (not on AWS though). From a healthy manager, I demoted the node being Down and then promoted it again. That solved my issue.