aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.08k stars 612 forks source link

Rolling deploys #110

Closed johnae closed 9 years ago

johnae commented 9 years ago

We've been playing around with the service and are somewhat confused about how it works. Basically, what we've tried (after a suggestion from an aws dev) is this:

Create launch configuration Create autoscaling group Create ecs cluster Create service and task, set number of tasks to 100 (eg. we don't think we'll ever reach this)

What we're deploying is a webapp. So basically we're trying a 1 to 1 mapping - one task per host. This way we can quite easily scale up the autoscaling group and tasks will be placed on each new ec2 instance and added to ELB. So far it's kind of reasonable.

Now, the problem is when we need to deploy a new version - there are no resources in the ecs cluster and no deploy takes place. One possibility here might be:

  1. Find number of ec2 instances running in the autoscaling group, set ecs tasks to that number - 1
  2. Deploy new task revision
  3. Wait/poll or something until all tasks are updated to new revision
  4. Set number of tasks to 100 again

That seemed kind of convoluted and there may be gotchas. Wouldn't this be solved if one could set the number of tasks sort of like you set min/max/desired for an autoscaling group? That way ecs could possibly do this more seamlessly by trying to keep tasks at max but at deploy time scale down to min tasks... or if it just wasn't so hellbent on always prioritizing number of tasks over deploy.

jhspaybar commented 9 years ago

Your 4 step process is probably what you'd need to do, and we agree this should be improved! Your suggestion to have a minimum number of tasks so that the service scheduler could do a rolling deploy on a full cluster is something we've been thinking about. For now, it won't kill tasks below your desired amount which is why you have to scale to N - 1 to get it to roll over your tasks. I've taken your comments into our internal tracking system so we can continue to improve the experience.

tj commented 9 years ago

+1 for this, it hasn't been a huge deal for us to allocate extra resources so far but it's a little unintuitive, and if you want fast staging etc then you really need a lot more resources

tj commented 9 years ago

Actually I take that back, we have a few cases where we use ports and counts to effectively force one of each container per instance (links are not great for these use-cases), so that does force us into this similar situation where we have to lower/raise count to deploy

mtparet commented 9 years ago

For example kubernetes is already able to do it: https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/update-demo#step-four-update-the-docker-image

euank commented 9 years ago

We've taken your feedback into account for service scheduler improvements. I think the original question has also been answered (thanks @jhspaybar) and there aren't immediately actionable Agent items here, so I'm closing this issue. More discussion around this is welcome of course.

Thanks for the feedback :smile:

tj commented 9 years ago

Somewhat related: seems that ECS likes to (purely chance?) group many of the same service on the same node at times, it would be nice if it preferred AZ's or just distribution between nodes by default

dustMason commented 8 years ago

Are there any updates to share on this front? We are facing this issue too