Closed CAFxX closed 6 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/157141182
The labels on this github issue will be updated when the story is started.
@CAFxX If the drain script can only succeed when the cluster is healthy, that would prevent anyone from bosh stopping an unhealthy cluster.
We can see how this change would be valuable in some cases, but we're concerned it would be harmful in others. We recommend using the proxy dashboard api to check if the cluster is healthy before doing a deploy.
@ctaymor isn't that why bosh deploy
and bosh recreate
both have a --skip-drain
option? If we applied the logic that the operator had to check manually on a job-specific dashboard before running deploy wouldn't cf-deployment become unusable pretty quickly? Also the bosh docs seem to agree that's the whole point of drain scripts (emphasis mine):
This script allows the job to clean up and get into a state where it can be safely stopped.
I think the safer option is to check for obvious signs of the cluster being unhealthy and allow the operator to force the deployment if they think it's the correct thing to do in their case.
@CAFxX Yeah, that's a fair point about --skip-drain
.
We'd be happy to accept a PR.
@ctaymor ok, just a question: is there currently any specific requirement requiring the node to be stopped during drain
(instead of during monit stop
)?
Closing due to open PR to resolve: #209
Sorry @CAFxX, thought we commented on this earlier. Yes, we do want to do the stopping in drain since for larger installations it can go over the monit timeout and cause bosh deploy failures.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/158265416
The labels on this github issue will be updated when the story is started.
The drain script currently does not check whether the cluster is healthy before stopping a node:
https://github.com/cloudfoundry/cf-mysql-release/blob/f78967ba30660fab51301ce345665df5cf12e40b/jobs/mysql/templates/drain.sh#L1-L6
We think it would be safer to check if all cluster nodes are online and in sync before asking the local node to shutdown.