ThoughtWorksStudios / eb_deployer

AWS Elastic Beanstalk blue-green deployment automation from ThoughtWorks Mingle Team
MIT License
400 stars 94 forks source link

Destroy Inactive Stack #23

Closed kmanning closed 9 years ago

kmanning commented 10 years ago

What do you guys think about a flag for the destroy command, so that only the inactive stack is destroyed?

The UseCase: We're doing deployments of our application in a pipeline, progressing from lower to higher environments. We were doing in_place updates with phoenix enabled in lower environments, and blue/green deployments in higher environments (business sign off, and production). In either case, deployments are 2 step - 1) tear down the current/inactive stack 2) create and deploy a new stack.
Our team is sensitive to end-to-end deployment times, and we could possibly cut down on deployment times if we did the tear-down earlier or in parallel with other jobs in the pipeline - ie: 1) tear down the inactive test environment stack while we package our application - now deploying to the test environment is only the time spent creating. 2) tear down the inactive business sign off stack while running regression tests, so that deployment of the new business sign off stack is only the time to create and deploy the new stack. Likewise, the inactive stacks could alternatively be cleaned up after a deployment, reducing the amount of idle stacks sitting around after a successful deployment - though this use is less important to me than reducing end-to-end pipeline time.

Possible solutions: I thought a --inactive flag might be a simple interface that could be used along with the --destroy command. It's a little weird if you're not using blue/green deployments, but it could return a reasonable message and do nothing in that case. The destroy is also currently asynchronous - it sends the message to amazon to tear down the stack and returns immediately before the termination is complete. Ideally for a pipeline, the tear down would be synchronous, with polling updates, the way the termination of a blue/green stack works - that way I don't have to worry about it colliding with the subsequent deployment. I don't know whether changing the existing asynchronous behavior of --destroy is desired, or whether yet another flag would be appropriate to control that.

Thoughts? I'd be happy to take a stab at a pull request, if the use case is consistent with how you all imagine the gem should work.

betarelease commented 10 years ago

Hi, We originally designed this to leverage AWS EB effectively while being able to zero-downtime deployments. We also wanted these deployments to not affect the user, while performing continuous delivery.

AWS is really good at deploying stacks of applications via EB. While EB builds these stacks there is a way to scale them up or down depending on the load. In the most general case the inactive stack will be automatically turned to its lowest scale, since there are no requests hitting it. Having an already existing stack helps you deploy a new version without having to provision another stack - which is time consuming. In essence we consider inactive stacks as harmless but still helpful during blue-green deployments.

So we did not invest much time into destroying environments that are inactive. For your usecase of "We were doing in_place updates with phoenix enabled in lower environments" we would recommend you still continue using blue-green deployments like production - with in_place deployments - since it takes only the time it would take to redeploy and not to create an environment from scratch.

In short, I am trying to say that you should not ever require to destroy a stack completely - unless something is significantly being altered about the structure of the stack.

If you don't mind, can you please explain how the environments are built and torn down and which parts are slowing you down? If you are able to draw a picture that would be great too.

Hope I am explaining it reasonably.

Thanks, Sudhindra.

xli commented 9 years ago

@kmanning, I think it make sense to destroy inactive environment in separated steps so that you can parallel more stuff. Please send a pull request if you still like to do it :)