Closed sunds closed 1 year ago
It is worth noting that direct access to /run/systemd/private happens only if the dbus daemon cannot be contacted:
// NewWithContext establishes a connection to any available bus and authenticates. // Callers should call Close() when done with the connection. func NewWithContext(ctx context.Context) (*Conn, error) { conn, err := NewSystemConnectionContext(ctx) if err != nil && os.Geteuid() == 0 { return NewSystemdConnectionContext(ctx) } return conn, err }
The problem was apparmor on this system blocking the call to DBUS.
apparmor_status apparmor module is loaded. 38 profiles are loaded. 37 profiles are in enforce mode. ... docker-default
Log:
May 27 03:13:12 garage kernel: [15540.770327] audit: type=1107 audit(1653621192.007:94): pid=759 uid=103 auid=4294967295 ses=4294967295 subj=? msg='apparmor="DENIED" operation="dbus_method_call" bus="system" path="/org/freedesktop/DBus" interface="org.freedesktop.DBus" member="Hello" mask="send" name="org.freedesktop.DBus" pid=5440 label="docker-default" peer_label="unconfined"
Adding --security-opt apparmor:unconfined to the docker run resolved this issue. However this is not the default when it is being installed from https://amazon-ecs-agent.s3.amazonaws.com/ecs-anywhere-install-latest.sh
Perhaps this issue should be moved to https://github.com/aws/amazon-ecs-init ?
Working command:
docker run \
--name "/ecs-agent" \
--runtime "runc" \
--volume "/var/run:/var/run" \
--volume "/var/log/ecs:/log" \
--volume "/var/lib/ecs/data:/data" \
--volume "/etc/ecs:/etc/ecs" \
--volume "/var/cache/ecs:/var/cache/ecs" \
--volume "/sys/fs/cgroup:/sys/fs/cgroup" \
--volume "/var/lib/ecs:/var/lib/ecs" \
--volume "/var/log/ecs/exec:/log/exec" \
--volume "/etc/ssl:/etc/ssl:ro" \
--volume "/root/.aws:/rotatingcreds:ro" \
--volume "/run/docker/plugins:/run/docker/plugins:ro" \
--volume "/etc/docker/plugins:/etc/docker/plugins:ro" \
--volume "/usr/lib/docker/plugins:/usr/lib/docker/plugins:ro" \
--volume "/var/lib/ecs/deps/execute-command/bin:/managed-agents/execute-command/bin:ro" \
--volume "/var/lib/ecs/deps/execute-command/config:/managed-agents/execute-command/config" \
--volume "/var/lib/ecs/deps/execute-command/certs:/managed-agents/execute-command/certs:ro" \
--volume "/proc:/host/proc:ro" \
--volume "/usr/lib:/usr/lib:ro" \
--volume "/lib:/lib:ro" \
--volume "/usr/lib64:/usr/lib64:ro" \
--volume "/lib64:/lib64:ro" \
--volume "/sbin:/host/sbin:ro" \
--volume "/etc/alternatives:/etc/alternatives:ro" \
--volume "/usr/sbin:/usr/sbin:ro" \
--log-driver "json-file" \
--log-opt max-file="4" \
--log-opt max-size="16m" \
--restart "" \
--network "host" \
--hostname "garage" \
--expose "51678/tcp" \
--expose "51679/tcp" \
--env "ECS_DATADIR=/data" \
--env "ECS_ENABLE_TASK_IAM_ROLE=true" \
--env "ECS_UPDATE_DOWNLOAD_DIR=/var/cache/ecs" \
--env "ECS_EXTERNAL=true" \
--env "ECS_CLUSTER=dsunds-test-1" \
--env "ECS_LOGFILE=/log/ecs-agent.log" \
--env "ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true" \
--env "ECS_VOLUME_PLUGIN_CAPABILITIES=[\"efsAuth\"]" \
--env "ECS_UPDATES_ENABLED=true" \
--env "ECS_AVAILABLE_LOGGING_DRIVERS=[\"json-file\",\"syslog\",\"awslogs\",\"fluentd\",\"none\"]" \
--env "ECS_AGENT_LABELS=" \
--env "ECS_AGENT_CONFIG_FILE_PATH=/etc/ecs/ecs.config.json" \
--env "SSL_CERT_DIR=/etc/ssl/certs" \
--env "ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true" \
--env "AWS_DEFAULT_REGION=us-east-1" \
--env "ECS_ENABLE_TASK_ENI=false" \
--env "PATH=/host/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--detach \
--entrypoint "/agent" \
--security-opt apparmor:unconfined \
"amazon/amazon-ecs-agent:latest"
Thanks for reporting! Currently Ubuntu22 is not an officially supported platform. ref This is tracked internally and will post there about the updates
I ran into the same issue and fixed it by adding a custom apparmor profile that allows access to dbus as such:
#include <tunables/global>
profile docker-ecs-agent flags=(attach_disconnected,mediate_deleted) {
#include <abstractions/base>
network,
capability,
file,
umount,
# Host (privileged) processes may send signals to container processes.
signal (receive) peer=unconfined,
# dockerd may send signals to container processes (for "docker kill").
signal (receive) peer=unconfined,
# Container processes may send signals amongst themselves.
signal (send,receive) peer=docker-datadog-agent,
deny @{PROC}/* w, # deny write for all files directly in /proc (not in a subdir)
# deny write to files not in /proc/<number>/** or /proc/sys/**
deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9]*}/** w,
deny @{PROC}/sys/[^k]** w, # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w, # deny everything except shm* in /proc/sys/kernel/
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/kcore rwklx,
deny mount,
deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
deny /sys/fs/[^c]*/** wklx,
deny /sys/fs/c[^g]*/** wklx,
deny /sys/fs/cg[^r]*/** wklx,
deny /sys/firmware/** rwklx,
deny /sys/kernel/security/** rwklx,
# suppress ptrace denials when using 'docker ps' or using 'ps' inside a container
ptrace (trace,read,tracedby,readby) peer=docker-datadog-agent,
# suppress ptrace denials when agent and process-agent are accessing /proc
ptrace (read),
# The ECS Agent needs access to dbus in order to launch tasks
dbus (send, receive, bind),
}
Then ran systemctl reload apparmor
to pick up the new profile and finally ran the ECS agent task with --security-opt apparmor=docker-ecs-agent
to use it.
After talking to Canonical support about this, just to get everything straight in my head I believe the issue is:
Ubuntu 22 now used cgroupv2 which is a change, so
calls
a function that attempts to call org.freedesktop.DBus.Hello
as part of the connection process
if that fails it will try to use the /run/systemd/private socket
directly as mentioned above
Ubuntu 22 allows the docker-default apparmor profile to contact dbus, but not call org.freedesktop.DBus.Hello
only peer to peer connections
ecs-init doesn't currently mount in the /run/systemd/private
socket to the ecs-agent container
If you have the ability to tweak the apparmor profile then the above post may work for now, we are on ubuntu core 22 without that ability and have already had to patch ecs-init to make start, so will probably have to add in the extra container mount point to our local patch
Thanks for the additional detail.
I recommend you either run the agent with --security-opt apparmor:unconfined or load a new apparmor profile for Docker that allows the dbus call. Running the agent with unconfined should not increase risk as it already has broad permissions and host networking.
If you want to use a modified profile, the one posted by @shanet is good. If you want to double check start with the Docker default profile https://github.com/moby/moby/tree/master/profiles/apparmor and add the extra dbus directive. You can scope it a bit more tightly:
# ECS agent requires DBUS send
dbus (send)
bus=system,
Here is my complete profile as of several weeks ago:
#include <tunables/global>
profile docker-default flags=(attach_disconnected, mediate_deleted) {
#include <abstractions/base>
network,
capability,
file,
umount,
# Host (privileged) processes may send signals to container processes.
signal (receive) peer=unconfined,
# dockerd may send signals to container processes (for "docker kill").
signal (receive) peer=unconfined,
# Container processes may send signals amongst themselves.
signal (send,receive) peer=docker-default,
# ECS agent requires DBUS send
dbus (send)
bus=system,
deny @{PROC}/* w, # deny write for all files directly in /proc (not in a subdir)
# deny write to files not in /proc/<number>/** or /proc/sys/**
deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9/]*}/** w,
deny @{PROC}/sys/[^k]** w, # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w, # deny everything except shm* in /proc/sys/kernel/
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/kcore rwklx,
deny mount,
deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
deny /sys/fs/[^c]*/** wklx,
deny /sys/fs/c[^g]*/** wklx,
deny /sys/fs/cg[^r]*/** wklx,
deny /sys/firmware/** rwklx,
deny /sys/kernel/security/** rwklx,
# suppress ptrace denials when using 'docker ps' or using 'ps' inside a container
ptrace (trace,read,tracedby,readby) peer=docker-default,
}
Write this file into /etc/apparmor.d/docker-default
You can install docker and then overwrite the default profile with this command:
apparmor_parser -r docker-default
If this works for your case then a modified ecs-init should not be necessary.
Alternatively if you are modifying ecs-init you can run just the agent with the modified profile or unconfined.
--security-opt apparmor=your_agent_profile
or
--security-opt apparmor:unconfined
Thanks to @sunds and @shanet today I could run some task in our ECS cluster with external on-prem docker instance which is running ubuntu22.04. Thanks again bros, keep it going!
Thanks @sunds and @shanet very much for bringing up this issue and sharing workaround with us. I am able to reproduce the issue, and use the custom AppArmor profile as a workaround.
Repro setup
$ curl -s 127.0.0.1:51678/v1/metadata | python2 -mjson.tool
{
"Cluster": "default",
"ContainerInstanceArn": "xxx",
"Version": "Amazon ECS Agent - v1.69.0 (*b32ab075)"
}
As Ubuntu 22.04 is not officially support by ECS Anywhere, and workarounds are available, this issue will be closed. Please feel free to open new issues and track the latest supported operating systems and system architectures via the public documentation.
Thanks.
Hi everyone, this is now supported in agent/init version 1.80.0: https://github.com/aws/amazon-ecs-agent/releases.
Support was added via this PR: https://github.com/aws/amazon-ecs-agent/pull/4062
Working on updating the docs now.
Summary
OS: Ubuntu 22.04 (LTS) ECS agent version="1.61.1" commit="8dc9fdeb"
Containers will not start.
Description
err=cgroupv2 create: unable to create v2 manager: dial unix /run/systemd/private: connect: no such file or directory
The problem is ECS agent runs in docker and /run/systemd/private is not mounted into the container. Editing the container config to add that bind mount worked around the problem.
Expected Behavior
Container runs
Observed Behavior
Launch fails due to missing bind mount
Environment Details
curl http://localhost:51678/v1/metadata {"Cluster":"dsunds-test-1","ContainerInstanceArn":"arn:aws:ecs:us-east-1:585275055393:container-instance/dsunds-test-1/17da2f096e234930a8ea495d5cb6b575","Version":"Amazon ECS Agent - v1.61.1 (8dc9fdeb)"}
lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04 LTS Release: 22.04 Codename: jammy
Deployed onto bare metal server
Supporting Log Snippets
Error from ECS agent log: cgroup: unable to create cgroup taskARN=arn:aws:ecs:us-east-1:585275055393:task/dsunds-test-1/383621ce97f643749b2c06061d345884 cgroupPath=ecstasks-383621ce97f643749b2c06061d345884.slice cgroupV2=true err=cgroupv2 create: unable to create v2 manager: dial unix /run/systemd/private: connect: no such file or directory"
The relevant part being that last error. Digging into the source it is trying to make a connection to the private DBUS socket