aws-samples / ecs-cid-sample

In this code provided with the blog, we will demonstrate how to use the draining state to update the AMI used by EC2 instances in your cluster by updating the launch configuration of your auto-scaling group. The process also ensures that the EC2 instances get the tasks drained the tasks launch on new container instance before termination occurs.
Apache License 2.0
103 stars 76 forks source link

Various issue with the sample code #20

Open WesleyCharlesBlake opened 6 years ago

WesleyCharlesBlake commented 6 years ago
    • The code can post messages to the wrong SNS topic when retrying. It looks for the first SNS topic in the account that has a lambda function subscribed to it and posts the retry message to that topic. See line # 161 in index.py for the implementation. The SNS message that is received by the Lambda already has the ARN of the SNS topic.
    • The code does not do any kind of pagination against the ECS API when reading the list of EC2 instances. So if it couldn't find the instance ID that was about to be terminated on the first page, then the instance was not set to DRAINING and the end users would see 50X messages when the operation timed out and autoscaling killed the instance.
  1. There is a large amount of unused code and variables in the in the code. The index.py file can be knocked down by half. #19 should resolve the unused variables.

    • The retry logic did not put in any kind of delay in place. The Lambda function would be invoked about 5-10 times a second, and each Lambda function invocation would provably make close to a dozen AWS API calls. This could result in accounts being throttled.
    • The .zip archive code differs to what is in VCS (particular there are changes to code/index.py. which are not included in the bundled archive. This could result in unexpected results for users following the blog article's recommendation
ardelio commented 6 years ago

Checkout my fork. It has a couple of issues resolved. Soon it won’t matter as you can use AWS Fargate (currently only available for US East)

hwatts commented 6 years ago

I'd add to the issues that it discovers the ECS cluster name by parsing userdata and looking for text strings.

This has potential to go wrong when used in environments that may be bootstrapping other things in userdata and given that this example is used in a CFN script that also creates the ECS Cluster (e.g. the name's already known), this should be passed as an environment variable to the lambda function.

danspain-plutoflume commented 6 years ago

I'd add to the issues that it discovers the ECS cluster name by parsing userdata and looking for text strings.

Yes, I've run in to this issue, It fails to find the cluster name if the user data is gzip encoded.

masneyb commented 6 years ago

The original bug report text in this issue is what I submitted to AWS support on 2018-02-26. Here is a pull request that addresses all of these issues: https://github.com/aws-samples/ecs-cid-sample/pull/23/files. The Lambda function in my new version is now small enough that I'm including it inline within the CloudFormation template so that there are no separate ZIP files to upload.

Brian