Open airmnichols opened 4 years ago
Yes! Bitten by this just now.
Yes! Bitten by this just now.
Kubernetes with pod disruption budgets is the way honestly. After moving from swarm to k8s things have been so much more reliable.
There hasn't been any activity on this issue for a long time.
If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale
comment.
If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.
Prevent issues from auto-closing with a /lifecycle frozen
comment.
/lifecycle stale
@docker-robot It's not our fault that the maintaniners are busy. That doesn't make the issue invalid. I'd like every damn bot (and their masters) to know this. I understand that having these bots helps triage important issues like a garbage collector, but a human should decide if it's garbage or not. Not a "timeout".
This is really confusing and reduces flexibility and reliability, now I need to manually configure a label instead of relying on this built-in availability
feature, hope this can be improved.
File: engine/swarm/swarm-tutorial/drain-node.md
States:
"Sometimes, such as planned maintenance times, you need to set a node to DRAIN availability. DRAIN availability prevents a node from receiving new tasks from the swarm manager. It also means the manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability."
This is misleading in that a drain operation has no logic to maintain the configured number of replicas during a drain operation.
This should be clearly explained.
If you have a two worker node swarm and have performed maintenance on worker node 1, this has all replicas running on worker node 2.
If you then drain worker node 2 for patching, it causes downtime because swarm doesn't for example, stop replica 1 on node 2, start replica 1 on node 1 before moving on to do the same for replica 2.
The current design causes downtime for applications. Support advised this is expected behavior and a workaround is to reconfigure all running services to have more replicas to force them to start on another worker node before issuing a drain command for a node.