aws / amazon-ecs-init

ecs-init is now part of the amazon-ecs-agent repo https://github.com/aws/amazon-ecs-agent/tree/master/ecs-init
https://github.com/aws/amazon-ecs-agent
Apache License 2.0
200 stars 117 forks source link

Plugin not found after updating ECS instance #327

Closed petderek closed 4 years ago

petderek commented 4 years ago

Summary

Customers using Preview support for ECS can experience issues with the amazon-ecs-volume-plugin not starting up when they perform agent upgrade.

Description

It looks like this can happen with the following steps:

Suggestions

If you're observing this, here are some things you can try to remedy the situation:

petderek commented 4 years ago

There's two ways we handle EFS tasks:

Agent has logic to fallback to the local driver if the plugin isn't bundled with ecs-init. We may need to update this to also check that the plugin is observable by docker. In other words:

if plugin_available:
   use_plugin

if plugin_available and plugin_registered_with_docker:
   use_plugin

Behavior may be subtly different in upstart vs systemd -- so we'll need to investigate this further.

fenxiong commented 4 years ago

I think the problem is that the agent is checking this env and uses the volume plugin if it contains "efsAuth", which ecs-init always specifies regardless of whether the volume plugin service is enabled to start by default.

Can this be fixed by embedding a config on the ami and only specify "efsAuth" when that config is true, similar to what has been done for gpu support https://github.com/aws/amazon-ecs-init/blob/6316c16de28dd7236fb520d307bd6086fc8b64a5/ecs-init/docker/docker.go#L312?

fierlion commented 4 years ago

This issue looks to be isolated to EFS preview customers. The GA version of EFS was released in version 20200319 (v1.38.0) of the ECS-optimized AMIs.

Customers who are using an ECS-optimized AMI earlier than version 20200319 (v1.38.0) should upgrade the AMI version to 20200319 (v1.38.0) or later to take advantage of the configuration required by GA EFS support on ECS, which includes enabling the amazon-ecs-volume-plugin by default. For more information, see Amazon ECS Optimized AMI Versions.

fierlion commented 4 years ago

I'm testing the upgrade path for customers using AL2 20191212 (v1.35.0) ECS Optimized with EFS preview. This should work for AL2 AMIs 20191212 (v1.35.0) and later

First we need to enable Docker and via extras and update:

sudo amazon-linux-extras enable docker
yum clean metadata
sudo yum install docker

This will update docker to v19.03.6-ce Next we'll do:

sudo yum update

This will will install the latest agent 1.39.0, but we need to restart agent in order for this to be enabled:

sudo systemctl stop ecs
sudo systemctl start ecs

Next we need to install amazon-efs-utils (note this package is installed by default for AMIs 1.36.0 and later):

sudo yum install amazon-efs-utils

And we need to enable the plugin:

sudo systemctl enable --now amazon-ecs-volume-plugin

Finally, we'll restart Docker:

sudo systemctl restart docker

You can test the install with an EFS filesystem in the same subnet as your upgraded instance:

docker volume create --name <volume_name> -d amazon-ecs-volume-plugin --opt o=tls,ro --opt type=efs --opt device=<efs_filesystem_id>
fierlion commented 4 years ago

This should work for AL1 AMIs 20191212 (v1.35.0) and later

First we'll do a yum update which will upgrade Docker to v19.03.6-ce and the agent version to 1.39.0:

sudo yum update

Next we need to install amazon-efs-utils (note this package is installed by default for AMIs 1.36.0 and later):

sudo yum install amazon-efs-utils

Next we'll need to reboot the instance so the plugin can be started up via upstart's init:

sudo shutdown -r now

After the instance reboot, ou can test the install with an EFS filesystem in the same subnet as your upgraded instance:

docker volume create --name <volume_name> -d amazon-ecs-volume-plugin --opt o=tls,ro --opt type=efs --opt device=<efs_filesystem_id>
fierlion commented 4 years ago

the AWS docs have been updated to reflect simplified versions of the above instructions:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/efs-volumes.html