Open sd65 opened 3 years ago
Wrote a working around for this limitation - https://toris.io/2021/06/using-ecs-exec-with-readonlyrootfilesystem-enabled-containers/
If you use ec2 backed ecs agent version 1.57.0, you should not specify the bind mount /var/log/amazon/ssm
as it will overlap with the mount set by the agent and prevent the container from starting.
Is it the same on EKS?
Write a small article for working around this limitation - https://toris.io/2021/06/using-ecs-exec-with-readonlyrootfilesystem-enabled-containers/
Was unable to replicate this workarounds. Both by declaring the volumes in Dockerfile or in the Task Definition. Maybe something changed in the SSM Agent that now prevents this workaround.
Hi @bmfs - I just wanted to note that the workaround works for me in ECS Fargate 1.4.0, using the Task Definition approach. So it might be due to an environment difference rather than changes in SSM Agent.
Our task-definition has the 3 volumes:
"volumes": [
{
"name": "managed-agents",
"host": {}
},
{
"name": "var-lib-amazon-ssm",
"host": {}
},
{
"name": "var-log-amazon-ssm",
"host": {}
},
And the 3 mount points in one of the containers:
"mountPoints": [
{
"sourceVolume": "managed-agents",
"containerPath": "/managed-agents",
"readOnly": false
},
{
"sourceVolume": "var-lib-amazon-ssm",
"containerPath": "/var/lib/amazon/ssm",
"readOnly": false
},
{
"sourceVolume": "var-log-amazon-ssm",
"containerPath": "/var/log/amazon/ssm",
"readOnly": false
},
This container has the agent running:
"managedAgents": [
{
"lastStartedAt": "2024-03-25T12:22:04.019000-04:00",
"name": "ExecuteCommandAgent",
"lastStatus": "RUNNING"
}
],
(Other containers in the same task-definition without the mount points have the agent stopped:)
"managedAgents": [
{
"name": "ExecuteCommandAgent",
"lastStatus": "STOPPED"
}
],
With this configuration, we're able to use "aws ecs execute-command" on the container with the agent running:
PS C:\Users\u123> aws ecs execute-command --profile xyz --cluster xyz --container xyz --interactive --command "/bin/sh" --task arnxyz
sh-4.4# df -a | grep agents\\\|ssm
/dev/nvme1n1 30787492 13423340 15774904 46% /managed-agents
/dev/nvme1n1 30787492 13423340 15774904 46% /var/lib/amazon/ssm
/dev/nvme1n1 30787492 13423340 15774904 46% /var/log/amazon/ssm
/dev/nvme0n1p1 5082764 2126208 2887764 43% /managed-agents/execute-command
sh-4.4# ps wwax --forest
PID TTY STAT TIME COMMAND
101 ? Ssl 0:00 /managed-agents/execute-command/amazon-ssm-agent
157 ? Sl 0:00 \_ /managed-agents/execute-command/ssm-agent-worker
25146 ? Sl 0:00 \_ /managed-agents/execute-command/ssm-session-worker ecs-execute-command-c9d0acd90ca90
25165 pts/0 Ss 0:00 \_ /bin/sh
25942 pts/0 R+ 0:00 \_ ps wwax --forest
@sd65 and @toricls - thanks so much for documenting this workaround for other ECS users. AWS ought to at least note this workaround in its documentation, if only with the caveat that the user is taking responsibility for it continuing to work.
Find below the error displayed in the cli when this issue occurs (attaching here for easier search from internet):
An error occurred (InvalidParameterException) when calling the ExecuteCommand operation: The execute command failed because execute command was not enabled when the task was run or the execute command agent isn’t running. Wait and try again or run a new task with execute command enabled and try again.
Hi @bmfs - I just wanted to note that the workaround works for me in ECS Fargate 1.4.0, using the Task Definition approach. So it might be due to an environment difference rather than changes in SSM Agent.
Our task-definition has the 3 volumes:
"volumes": [ { "name": "managed-agents", "host": {} }, { "name": "var-lib-amazon-ssm", "host": {} }, { "name": "var-log-amazon-ssm", "host": {} },
And the 3 mount points in one of the containers:
"mountPoints": [ { "sourceVolume": "managed-agents", "containerPath": "/managed-agents", "readOnly": false }, { "sourceVolume": "var-lib-amazon-ssm", "containerPath": "/var/lib/amazon/ssm", "readOnly": false }, { "sourceVolume": "var-log-amazon-ssm", "containerPath": "/var/log/amazon/ssm", "readOnly": false },
This container has the agent running:
"managedAgents": [ { "lastStartedAt": "2024-03-25T12:22:04.019000-04:00", "name": "ExecuteCommandAgent", "lastStatus": "RUNNING" } ],
(Other containers in the same task-definition without the mount points have the agent stopped:)
"managedAgents": [ { "name": "ExecuteCommandAgent", "lastStatus": "STOPPED" } ],
With this configuration, we're able to use "aws ecs execute-command" on the container with the agent running:
PS C:\Users\u123> aws ecs execute-command --profile xyz --cluster xyz --container xyz --interactive --command "/bin/sh" --task arnxyz sh-4.4# df -a | grep agents\\\|ssm /dev/nvme1n1 30787492 13423340 15774904 46% /managed-agents /dev/nvme1n1 30787492 13423340 15774904 46% /var/lib/amazon/ssm /dev/nvme1n1 30787492 13423340 15774904 46% /var/log/amazon/ssm /dev/nvme0n1p1 5082764 2126208 2887764 43% /managed-agents/execute-command sh-4.4# ps wwax --forest PID TTY STAT TIME COMMAND 101 ? Ssl 0:00 /managed-agents/execute-command/amazon-ssm-agent 157 ? Sl 0:00 \_ /managed-agents/execute-command/ssm-agent-worker 25146 ? Sl 0:00 \_ /managed-agents/execute-command/ssm-session-worker ecs-execute-command-c9d0acd90ca90 25165 pts/0 Ss 0:00 \_ /bin/sh 25942 pts/0 R+ 0:00 \_ ps wwax --forest
@sd65 and @toricls - thanks so much for documenting this workaround for other ECS users. AWS ought to at least note this workaround in its documentation, if only with the caveat that the user is taking responsibility for it continuing to work.
Thanks for this workaround. What is the purpose of including the /managed-agents volume? I successfully implemented this workaround on ECS Fargate with only /var/lib/amazon/ssm and /var/log/amazon/ssm. Note that I used 'aws ssm start-session' rather than 'aws ecs execute-command'.
It seems like the /managed-agents directory contains the agent binaries and I'm not sure that data will be written there while the agent is running.
What is the purpose of including the /managed-agents volume?
@mselcik - I checked my notes from earlier this year, and I think I included /managed-agents from the start based on the tips from @sd65 at the beginning of this issue. I don't think it was that I ran into an issue and was required to add it. I notice now, there was a filesystem/volume created automatically for /managed-agents/execute-command where the binaries are, which must be distinct from the /managed-agents volume I specified in the task definition. Maybe the automatically-created /managed-agents/execute-command is all that's necessary, rather than /managed-agents.
What is the purpose of including the /managed-agents volume?
@mselcik - I checked my notes from earlier this year, and I think I included /managed-agents from the start based on the tips from @sd65 at the beginning of this issue. I don't think it was that I ran into an issue and was required to add it. I notice now, there was a filesystem/volume created automatically for /managed-agents/execute-command where the binaries are, which must be distinct from the /managed-agents volume I specified in the task definition. Maybe the automatically-created /managed-agents/execute-command is all that's necessary, rather than /managed-agents.
Thanks for your response. I had a look and also observed that a filesystem at /managed-agents/execute-command was automatically created. Below is partial output from the "mount" command:
/dev/nvme1n1 on /tmp type ext4 (rw,relatime) /dev/nvme1n1 on /var/lib/amazon/ssm type ext4 (rw,relatime) /dev/nvme1n1 on /var/log/amazon/ssm type ext4 (rw,relatime) /dev/nvme1n1 on /etc/hosts type ext4 (rw,relatime) /dev/nvme1n1 on /etc/resolv.conf type ext4 (rw,relatime) /dev/nvme1n1 on /etc/hostname type ext4 (rw,relatime) /dev/nvme0n1p1 on /managed-agents/execute-command type ext4 (ro,noatime)
The first three filesystems are created due to three ECS volumes being specified in the task definition. However the /managed-agents/execute-command filesystem is automatically created and read-only, so my conclusion is that this volume does not need to be created as part of the ECS task definition in order to enable "execute-command".
Workaround failed for me with and without /managed-agents
mount:
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
@h0rv - If you describe your task, can you see what the managedAgents block says? I think that gives an indication, even before you try ecs exec, whether the agent has been started OK:
"managedAgents": [
{
"lastStartedAt": "2024-03-25T12:22:04.019000-04:00",
"name": "ExecuteCommandAgent",
"lastStatus": "RUNNING"
}
],
aws ecs exec checker
gives me the output below, indicating the agent is running fine on all containers:
...
ecs:ExecuteCommand: allowed
ssm:StartSession denied?: allowed
Task Status | RUNNING
Launch Type | Fargate
Platform Version | 1.4.0
Exec Enabled for Task | OK
Container-Level Checks |
----------
Managed Agent Status
----------
1. RUNNING for "datadog-agent"
2. RUNNING for "logging-router"
3. RUNNING for "my-service"
...
aws ecs execute-command \
--cluster <cluster_arn> \
--task <task_arn> \
--container my-service \
--command "/bin/bash" \
--interactive
Results in the error:
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
Also, each container has its own volumes for each bind mount: https://repost.aws/knowledge-center/ecs-error-execute-command#COBQ6pGrzfSSaDtECWcDVqBw.
@h0rv - Wish I had a better idea, but, dumb question: is /bin/bash really available and executable in that my-service container? I went back through my notes and didn't see much different about what you pasted, I only noticed that in my container I was using /bin/sh, which made me wonder.
@jdoylei - Yes it is available in my container and I was able to run this command before turning on readonly.
@h0rv - I see, sorry I couldn't be more help. It's tricky when the main tool you have for debugging - ecs exec - doesn't work itself. I think when I was troubleshooting ecs exec I had to resort to embedding commands in my containers at startup to dump the filesystem list, dump the process tree, etc., to see what was going on.
Community Note
Tell us about your request
I would like to use the ECS Exec feature with readonlyRootFilesystem enabled containers.
Which service(s) is this request for?
ECS/Fargate
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Currently readonlyRootFilesystem enabled containers are not supported, the AWS managed agent crash soon after launch.
Are you currently working around this issue?
Yes. I've managed to get it working with
readonlyRootFilesystem: true
by mounting/managed-agents
,/var/lib/amazon/ssm
and/var/log/amazon/ssm
as writable volumes inside.Additional context
https://github.com/aws-containers/amazon-ecs-exec-checker/issues/21