aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.41k stars 3.8k forks source link

aws-ecs: Cloud-init script for EC2 fails when using AWS Linux 2023 #28518

Open juinquok opened 7 months ago

juinquok commented 7 months ago

Describe the bug

When using EC2 as the capacity provider in the addAsgCapacityProvider method, the user is expected to specify the machineImageType for the capcity provider. In doing so, it will add a user data script that will inject additional data that is relevant to the user for the /etc/ecs/ecs.config file

One of the commands that is injected is:

autoScalingGroup.addUserData('sudo iptables --insert FORWARD 1 --in-interface docker+ --destination 169.254.169.254/32 --jump DROP');
autoScalingGroup.addUserData('sudo service iptables save');

On AWS Linux 2023 running the Linux 6.1.66-91.160.amzn2023.x86_64 kernel, this command results in an error when executing the cloud-init file.

+ sudo iptables --insert FORWARD 1 --in-interface docker+ --destination 169.254.169.254/32 --jump DROP
+ sudo service iptables save
The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, reload-or-restart, try-reload-or-restart, force-reload, status, condrestart). For other actions, please try to use systemctl.
2023-12-29 06:11:05,181 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2023-12-29 06:11:05,184 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.9/site-packages/cloudinit/config/cc_scripts_user.py'>) failed

This results in the line after

echo ECS_AWSVPC_BLOCK_IMDS=true >> /etc/ecs/ecs.config

to not get run which is not ideal.

Expected Behavior

It should successfully run the required ECS setup configs when the EC2 instance starts.

Current Behavior

The cloud-init script will fail with the error message The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, reload-or-restart, try-reload-or-restart, force-reload, status, condrestart). For other actions, please try to use systemctl.

Reproduction Steps

Start an ECS Cluster with an EC2 capacity provider and the AMI in the launch template for the autoscaling group to be the latest AWS Linux 2023 AMI (ecs.EcsOptimizedImage.amazonLinux2023(AmiHardwareType.STANDARD)). The error will occur when the instance starts up and the logs can be found in /var/log/cloud-init-output.log

Possible Solution

Introduce a new machineImageType in the addAsgCapacityProvider method and name it AMAZON_LINUX_2023. In the configureAutoScalingGroup method in cluster.ts, add in a new switch condition to render different user data for the ECS Optimized AMI for AWS Linux 2023. In particular, the sudo service iptables save will be changed to sudo iptables-save > /etc/sysconfig/iptables which will not throw the same error as above.

Additional Information/Context

No response

CDK CLI Version

2.114.1 (build 02bbb1d)

Framework Version

No response

Node.js Version

v18.17.0

OS

macOS 14.2

Language

TypeScript

Language Version

No response

Other information

No response

pahud commented 6 months ago

Thank you for the possible solution. Yes this might be an option. Making this a p1 as it's not easy to work it around.

juinquok commented 6 months ago

Happy to help raise a PR to implement it if the possible solution is acceptable :)

BwL1289 commented 5 months ago

Also experiencing this. Need an option to specify AMAZON_LINUX_2023.

BwL1289 commented 5 months ago

@juinquok also happy to help with this.

Additionally, we should add a note that if you choose BOTTLEROCKET you need to be using it in your ASG machine image. Experienced a bug today that cost me a bunch of hours due to userData not being set so the ecs agent did not know the name of the cluster.

IwoTens commented 5 months ago

Just to add: We've noticed this issue with AL2 as well. We solved it by installing iptables-services in the userdata, so that the command can be run.

juinquok commented 5 months ago

@pahud Should I raise a PR to implement this change if its agreeable with the team?

pahud commented 3 months ago

@juinquok Yes feel free to submit a PR and let's move this forward.