aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 318 forks source link

[ECS/Fargate] [request]: ECS Exec : support readonlyRootFilesystem containers #1359

Open sd65 opened 3 years ago

sd65 commented 3 years ago

Community Note

Tell us about your request

I would like to use the ECS Exec feature with readonlyRootFilesystem enabled containers.

Which service(s) is this request for?

ECS/Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Currently readonlyRootFilesystem enabled containers are not supported, the AWS managed agent crash soon after launch.

Are you currently working around this issue?

Yes. I've managed to get it working with readonlyRootFilesystem: true by mounting /managed-agents, /var/lib/amazon/ssm and /var/log/amazon/ssm as writable volumes inside.

Additional context

https://github.com/aws-containers/amazon-ecs-exec-checker/issues/21

toricls commented 3 years ago

Wrote a working around for this limitation - https://toris.io/2021/06/using-ecs-exec-with-readonlyrootfilesystem-enabled-containers/

naomine-biz commented 2 years ago

If you use ec2 backed ecs agent version 1.57.0, you should not specify the bind mount /var/log/amazon/ssm as it will overlap with the mount set by the agent and prevent the container from starting.

dariusz22p commented 1 year ago

Is it the same on EKS?

bmfs commented 1 year ago

Write a small article for working around this limitation - https://toris.io/2021/06/using-ecs-exec-with-readonlyrootfilesystem-enabled-containers/

Was unable to replicate this workarounds. Both by declaring the volumes in Dockerfile or in the Task Definition. Maybe something changed in the SSM Agent that now prevents this workaround.

jdoylei commented 6 months ago

Hi @bmfs - I just wanted to note that the workaround works for me in ECS Fargate 1.4.0, using the Task Definition approach. So it might be due to an environment difference rather than changes in SSM Agent.

Our task-definition has the 3 volumes:

    "volumes": [
        {
            "name": "managed-agents",
            "host": {}
        },
        {
            "name": "var-lib-amazon-ssm",
            "host": {}
        },
        {
            "name": "var-log-amazon-ssm",
            "host": {}
        },

And the 3 mount points in one of the containers:

            "mountPoints": [
                {
                    "sourceVolume": "managed-agents",
                    "containerPath": "/managed-agents",
                    "readOnly": false
                },
                {
                    "sourceVolume": "var-lib-amazon-ssm",
                    "containerPath": "/var/lib/amazon/ssm",
                    "readOnly": false
                },
                {
                    "sourceVolume": "var-log-amazon-ssm",
                    "containerPath": "/var/log/amazon/ssm",
                    "readOnly": false
                },

This container has the agent running:

                    "managedAgents": [
                        {
                            "lastStartedAt": "2024-03-25T12:22:04.019000-04:00",
                            "name": "ExecuteCommandAgent",
                            "lastStatus": "RUNNING"
                        }
                    ],

(Other containers in the same task-definition without the mount points have the agent stopped:)

                    "managedAgents": [
                        {
                            "name": "ExecuteCommandAgent",
                            "lastStatus": "STOPPED"
                        }
                    ],

With this configuration, we're able to use "aws ecs execute-command" on the container with the agent running:

PS C:\Users\u123> aws ecs execute-command --profile xyz --cluster xyz --container xyz --interactive --command "/bin/sh" --task arnxyz

sh-4.4# df -a | grep agents\\\|ssm
/dev/nvme1n1    30787492 13423340  15774904  46% /managed-agents
/dev/nvme1n1    30787492 13423340  15774904  46% /var/lib/amazon/ssm
/dev/nvme1n1    30787492 13423340  15774904  46% /var/log/amazon/ssm
/dev/nvme0n1p1   5082764  2126208   2887764  43% /managed-agents/execute-command

sh-4.4# ps wwax --forest
  PID TTY      STAT   TIME COMMAND
  101 ?        Ssl    0:00 /managed-agents/execute-command/amazon-ssm-agent
  157 ?        Sl     0:00  \_ /managed-agents/execute-command/ssm-agent-worker
25146 ?        Sl     0:00      \_ /managed-agents/execute-command/ssm-session-worker ecs-execute-command-c9d0acd90ca90
25165 pts/0    Ss     0:00          \_ /bin/sh
25942 pts/0    R+     0:00              \_ ps wwax --forest

@sd65 and @toricls - thanks so much for documenting this workaround for other ECS users. AWS ought to at least note this workaround in its documentation, if only with the caveat that the user is taking responsibility for it continuing to work.

gmuslia commented 1 week ago

Find below the error displayed in the cli when this issue occurs (attaching here for easier search from internet):

An error occurred (InvalidParameterException) when calling the ExecuteCommand operation: The execute command failed because execute command was not enabled when the task was run or the execute command agent isn’t running. Wait and try again or run a new task with execute command enabled and try again.