acquia / moonshot

Moonshot: Because releasing services shouldn't be a moonshot!
Apache License 2.0
52 stars 50 forks source link

Add the capability of temporarily increasing the desired capacity during code deploys #37

Open jamesiarmes opened 8 years ago

jamesiarmes commented 8 years ago

Issue by cpliakas Wednesday Nov 18, 2015 at 02:53 GMT Originally opened as https://github.com/acquia/cloud-moonshot/issues/45


Not sure if this is within the scope of this project, however the goal of this enhancement is to maintain at least the original number of healthy hosts that you started with throughout the deployment. In an anti-fragile system, it should strengthen during risky operations like code deploys.

For example, my application has 3 instances. During deployments, this temporarily dips to 2 healthy hosts since each instance has to stop the service, deploy the code, and then restart the service. See the graph below:

screen shot 2015-11-17 at 9 32 24 pm

When performing AMI updates, CloudFormation temporarily increases the desired capacity during the rolling upgrade to maintain the number of healthy hosts throughout the process. It would be great to mimic this behavior, whether that is core to the Moonshot tool or some sort of opt-in thing with via some extension.

jamesiarmes commented 8 years ago

Comment by askreet Wednesday Nov 18, 2015 at 02:59 GMT


I wrote a tool called howiroll to do ASG updates, because I couldn't find documentation in CloudFormation to support that it handled ELB connection draining properly. In that tool, I noticed I didn't add new instances before removing old ones. I'd like to port that functionality into Moonshot and improve on the methodology for rolling out launch configuration updates.

That being said, you're asking about CodeDeploy. I suppose it would be possible to modify the CodeDeploy DeploymentMechanism with knowledge of modifying and waiting for ASG capacity to change, but I think a good first-pass solution is simply to always scale your ASGs for N+1 (or greater) capacity. In critical infrastructure, probably N*2+1. Otherwise, how are you going to sustain an Availability Zone failure?

I'm not saying I'm against this feature, just that I think it has a low priority because it can be mitigated in production by throwing money at the problem, which is generally cheaper than throwing engineering time at a problem.

jamesiarmes commented 8 years ago

Comment by cpliakas Wednesday Nov 18, 2015 at 03:45 GMT


Would you accept a pull request, or would this be best to handle outside of Moonshot?

jamesiarmes commented 8 years ago

Comment by cpliakas Wednesday Nov 18, 2015 at 03:49 GMT


I totally hear what you are saying, btw, about throwing money at the problem as opposed to investing engineering time.

jamesiarmes commented 8 years ago

Comment by askreet Wednesday Nov 18, 2015 at 10:23 GMT


I'd accept a pull request, definitely. I think it should be a tunable option on the CodeDeploy class' constructor that controls some kind of pre/post-deployment behaviors.