degica / barcelona

PaaS built on top of AWS
MIT License
52 stars 6 forks source link

Better instance replacement #318

Closed k2nr closed 7 years ago

k2nr commented 7 years ago

The current rolling update has been working fine but it has some problems. ECS released task constraint recently http://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_PlacementConstraint.html with task constraints and autoscaling group lifecycle hook, we can do instance replacement better. Here's my rough idea:

Prerequisites

Replacement Strategy

  1. When ASG replacement is triggered, LAUNCHING lifecycle hook is executed and the hook adds "draining" label to all existing container instances
  2. After new ASG and its instances start, TERMINATING lifecycle hook is executed and the hook restarts all ECS services
    1. Then new services will be placed new container instances because exisisting instances have "draining" label and thus tasks cannot be placed
    2. When all services are deployed, the old instances do not have ECS tasks running
  3. After the step 3 finishes, the old instances start shutting down

Benefits

Compared to the current "rolling update" strategy, this way has several benefits

k2nr commented 7 years ago

ECS has released DRAINING state feature (I had been waiting this feature for a year!), so we no longer need to implement this complex replacement logic. I'll rewrite the description

k2nr commented 7 years ago

This is done