aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.08k stars 612 forks source link

Some containers in an ECS cluster have different userdata than the other #2363

Closed ankila1b2 closed 4 years ago

ankila1b2 commented 4 years ago

I am running an ecs cluster for my jenkins with ecs-optimized ami. I've updated the launch configuration with the latest ami and allocated to an autoscaling group. When I launched the cluster, it spuns with some container instances having just name of cluster as ecs config and some having the following configs: ECS_CLUSTER="name of cluster" ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m ECS_DISABLE_IMAGE_CLEANUP=false ECS_IMAGE_CLEANUP_INTERVAL=10m ECS_IMAGE_MINIMUM_CLEANUP_AGE=30m

How does this work?

It shows the same in userdata when copying the launch configuration. Please let me know.

Because of this missing configs, the image clean up is not taking place which is resulting into the build failures showcasing following errors:

1) npm WARN tar ENOSPC: no space left on device, write

2) cannot write data to tempfile "/home/jenkins/workspace/_buildname_/.git/lfs/incomplete/d94dcde0274b1f5ae31901c09ea3f0492a3280a778": write /home/jenkins/workspace/_buildname_/.git/lfs/incomplete/d94dcde0274b1f5ae31901c09ea3f0492a3280a778: no space left on device

ankila1b2 commented 4 years ago

Also, please let me know the minimum value of configs that I can keep for all the parameters below: ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m ECS_IMAGE_CLEANUP_INTERVAL=10m ECS_IMAGE_MINIMUM_CLEANUP_AGE=30m

ankila1b2 commented 4 years ago

ECS Cluster Autoscaling Group Launch Config UserData:

#!/bin/bash
set -ex
yum install -y aws-cfn-bootstrap
/opt/aws/bin/cfn-init -v   --stack _cft-name_  --resource ECSLaunchConfig  --region us-west-2
echo ECS_CLUSTER=_cluster-name_ >> /etc/ecs/ecs.config
mkdir -p /mnt/efs
aws_az="$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)"
aws_region="${aws_az:0:${#aws_az}-1}"
echo "${aws_az}.fs-xxxxxx.efs.${aws_region}.amazonaws.com:/    /mnt/efs   nfs4    defaults" >> /etc/fstab
mount -a
service docker restart
/opt/aws/bin/cfn-signal -e $?   --stack _cft-name_  --resource ECSAutoScalingGroup  --region us-west-2
echo "ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m" >> /etc/ecs/ecs.config
echo "ECS_DISABLE_IMAGE_CLEANUP=false" >> /etc/ecs/ecs.config
echo "ECS_IMAGE_CLEANUP_INTERVAL=10m" >> /etc/ecs/ecs.config
echo "ECS_IMAGE_MINIMUM_CLEANUP_AGE=30m" >> /etc/ecs/ecs.config

When logging into the container instance and checking the ecs-configs: ECS_CLUSTER=_cluster-name_

Why did the remaining ECS configs didn't undergo?

Please help me out on that. Thank you.

petderek commented 4 years ago

If one of the commands in your userdata script fails, the rest of it won't complete due to set -ex. I'd look at the following files to validate this:

/var/log/cloud-init.log
/var/log/cloud-init-output.log

If this is the case, one way to immediately fix this would be to move the environment variable declarations to the top of the script (before any of the more complicated pieces happen).

petderek commented 4 years ago

ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m ECS_IMAGE_CLEANUP_INTERVAL=10m ECS_IMAGE_MINIMUM_CLEANUP_AGE=30m

The first two are already the minimum settings. It doesn't look like we enforce a minimum for the cleanup age.

ankila1b2 commented 4 years ago

ok thanks, the issue was resolved by the following changes:

#!/bin/bash
set -ex
yum install -y aws-cfn-bootstrap
/opt/aws/bin/cfn-init -v   --stack prod-ecs-service-jenkin  --resource ECSLaunchConfig  --region us-west-2
mkdir -p /mnt/efs
aws_az="$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)"
aws_region="${aws_az:0:${#aws_az}-1}"
echo "${aws_az}.fs-xxxxxx.efs.${aws_region}.amazonaws.com:/    /mnt/efs   nfs4    defaults" >> /etc/fstab
mount -a
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=_cluster-name_
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m
ECS_DISABLE_IMAGE_CLEANUP=false
ECS_IMAGE_CLEANUP_INTERVAL=10m
ECS_IMAGE_MINIMUM_CLEANUP_AGE=11m
EOF
service docker restart
/opt/aws/bin/cfn-signal -e $?   --stack _cft-name_  --resource ECSAutoScalingGroup  --region us-west-2

Hence, closing the ticket.

Thank you.