Use built-in task update mechanism

TomFrost commented 9 years ago

Per the ECS docs at http://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html:

If you have updated the Docker image of your application, you can create a new task definition with that image and deploy it to your service, one task at a time. The service scheduler creates a task with the new task definition (provided there is an available container instance to place it on), and after it reaches the RUNNING state, a task that is using the old task definition is drained and stopped. This process continues until all of the desired tasks in your service are using the new task definition.

Why not use this mechanism instead of scaling the tasks down, then back up again? I'm currently using this deploy step for staging, but I'm extremely hesitant to use it on production because if my infrastructure has autoscaled up to Y tasks from my configured minimum of X and I need to deploy, scaling back down to X or fewer would be catastrophic.

Please let me know if I'm misunderstanding how this works.

sebalas commented 9 years ago

Hi TomFrost,

Our first use case involved using fixed host/container port binding to run nginx fronted by an ELB. Unfortunately, ECS/Docker does not have the ability to put 2 tasks on the same machine if they expose the same port.

When calling UpdateService with a new task definition, the service scheduler will always scale up at least 1 new container before shutting down old containers. Since there's no port available, ECS will fail with this error message: unable to place a task because the resources could not be found. To be able to use UpdateService (and its rolling update capability), we needed to scale the service to N-1 where N is the current running task count for that service. We could also have scaled up the number of EC2 instances.

You can find an interesting thread here: https://forums.aws.amazon.com/thread.jspa?threadID=179271

I think we could modify the step to offer three strategies:

do nothing (just call UpdateService and fail it it has too)
scale down the ECS service (current strategy)
scale up the number of EC2 instances

TomFrost commented 9 years ago

Thanks for your response! If I may, allow me to break Option 2 into two sub-options:

2a. scale down the ECS service to the number of tasks (minus 1) defined in the wercker.yml or Wercker environment variable (current strategy) 2b. scale down the ECS service to the number of tasks (minus 1) that are currently running, as reported by the ECS API.

My concern is that controlling the minimal number of tasks from a Wercker-centric configuration ignores the fact that applications may have their own logic to scale their tasks up or down as load changes. If I want to deploy a new image at peak usage times for my application -- without scaling my service down too low to support current traffic -- that requires finding the number of tasks currently running and updating wercker.yml or a Wercker-stored environment variable to that number before triggering the build. If I want to roll back by using Wercker to deploy an earlier build, I have a similar problem.

My thought is that making an API call to find the currently running task count immediately before scaling down to that number - 1 would ensure minimal impact to my application traffic without requiring developers to change any Wercker configuration.

PepijnK commented 8 years ago

I agree with TomFrost. If wercker-aws-ecs could just update the task definition of a service (in a non blocking way) I would be very happy.

elsevier-research / wercker-aws-ecs

Use built-in task update mechanism #5