fabfuel / ecs-deploy

Powerful CLI tool to simplify Amazon ECS deployments, rollbacks & scaling
Other
854 stars 148 forks source link

Deploy without timeout or wait #60

Closed erfangc closed 5 years ago

erfangc commented 6 years ago

Hi,

I want to thank you guys for this fantastic utility program to simplify deployment on ECS. Are there plans to support deployments without a timeout option?

For certain deployments (especially launch type Fargate) it appears ECS Cluster manager takes a while to shutdown/deregister from ASG and reach steady state. The default 300s is not really enough, but it is hard to know when to arbitrarily increase it to

In certain projects, it may be better to not have a timeout. Is this an functionality being considered?

dmitrye commented 6 years ago

Just with 2 tasks running in ECS Fargate I've had to increase the timeout period PAST 600 seconds. This is tied to your autoscaling rule settings for intervals before alarms are fired. However, my autoscaling rules are designed for runtime not deploy time. So, I need the ability to deploy new tasks confirm they are good then move on and not have the build server (which I'm paying for by the minute) wait 5-10 minutes for the old tasks (listed as INACTIVE in AWS console) to shutdown.

For now I'm going with 900 (15 minutes?) seconds which seems to cover it. There are some instructions out there on setting up Blue/Green deployments which is really the right way to go. Deploy to green, confirm all is good, switch traffic to Green, then shut down the tasks in blue or leave them running. Both have $$ and configuration implications. You would have to allow for 0 active tasks or be ok with 2 old versions (if that's your minimum) running indefinitely in the inactive cluster / service.

I was hoping there would be a way to either ignore the timeout like @erfangc requests and move on with the build or somehow ignore the removal of old tasks from the timeout period.

fabfuel commented 6 years ago

Hi @erfangc & @dmitrye,

the timeout parameter only instructs eco-deploy for how long it should wait for ECS – which actually means, for how long it should check, if the deployment in ECS finished successfully. The type of deployment (e.g. blue-green or not) and the duration of your deployment is 100% depending on your ECS cluster setup (or Fargate configuration), your ECS service deployment configuration, your load-balancer configuration (if used) and your task definition.

The only solution I can image now: Defining a timeout of 0 should mean, that ecs-deploy does not wait and check for the finished deployment at all. That would then work like a "fire and forget" deployment. You will never know if it worked or not – without checking AWS APIs or the console yourself.

Is that what you meant?

Best Fabian

erfangc commented 6 years ago

@fabfuel thank you that response, this is precisely what I am asking the reason is: we (and many others) have separate infrastructure for monitoring the task count / health metrics of the cluster

Our deployment(s) for whatever reason runs way longer than 600s. Extending the timeout is not ideal, because that ends up hogging CI server resources (Bitbucket Pipeline for example bills in bundles of minutes)

Hopefully the above illustrate valid use cases for fire and forget

erfangc commented 6 years ago

fyi - my work around right now is just

ecs deploy staging my-staging-service --timeout 0 || true
katafractari commented 6 years ago

@erfangc But does your service actually update using this workaround? I get:

Deploying new task definition
Deployment failed due to timeout. Please see: https://github.com/fabfuel/ecs-deploy#timeout

And my container is apparently not updated with the current version of the Docker image.

erfangc commented 6 years ago

@katafractari it does indeed, what does your ECS event logs say?

fabfuel commented 6 years ago

@katafractari please check the task-definition revision of your service after the deployment. ecs-deploy creates a new revision and updates the service accordingly. Independent from how long the deployment takes, the newest revision should be running eventually. If not, something prevents the containers from running successfully with the newest version (e.g. the image/tag can not be pulled or the application inside the container crashes, etc.).

Do you deploy based on reused tag names? For instance using Git branch names or "latest"? This can lead to ECS not pulling the actual newest version of the Docker image. The pulling behavior can be configured, please have a look here https://github.com/aws/amazon-ecs-agent/issues/413 https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html#image_pull

katafractari commented 6 years ago

@fabfuel Reusing the latest tag was indeed causing ECS to no pull the newest version of the Docker image. I've solved this by tagging each image with $CODEBUILD_BUILD_ID on AWS CodeBuild and calling ecs-deploy with --tag $CODEBUILD_BUILD_ID argument.

Appreciate your help guys!

donaldpiret commented 5 years ago

+1 for having the option of getting a fire-and-forget type deployment

cbuto commented 5 years ago

+1

fabfuel commented 5 years ago

Hi guys! I added this feature and will create a new release on PyPI soon