aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.08k stars 613 forks source link

Cannot start containers after upgrade to 1.17.0 on Ubuntu 14.04 #1252

Closed oleborup closed 6 years ago

oleborup commented 6 years ago

Summary

After upgrading from 1.16.2 to 1.17.0 ECS Agent cannot start containers

Description

Due to security compliance and historical reasons we are running Ubuntu 14.04 as host machines. After upgrading from 1.16.2 to 1.17.0, the agent cannot start containers.

Observed Behavior

Agent cannot start containers. Agent log digest:

STOPPED, Reason CannotStartContainerError: API error (500): OCI runtime create failed:
container_linux.go:296: starting container process caused "process_linux.go:398: 
container init caused \"process_linux.go:365: setting cgroup config for procHooks process caused
\\\"failed to write 0 to memory.swappiness: write /sys/fs/cgroup/memory/ecs/fbccf823-3e8c-4fea-963a-e076313b3db7/4b5f0e9ef2c4c9fe8af9137aec68a9fc1a007de6021c1429cdb57e12a5b60feb/memory.
swappiness: invalid argument\\\"\"": unknown

Environment Details

# uname -a
Linux ip-10-10-2-236 3.13.0-141-generic #190-Ubuntu SMP Fri Jan 19 12:52:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty
# docker info
Containers: 7
 Running: 1
 Paused: 0
 Stopped: 6
Images: 6
Server Version: 17.12.0-ce
Storage Driver: devicemapper
 Pool Name: docker-202:1-135436-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Udev Sync Supported: true
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 815.2MB
 Data Space Total: 107.4GB
 Data Space Available: 5.445GB
 Metadata Space Used: 1.991MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.145GB
 Thin Pool Minimum Free Space: 10.74GB
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.77 (2012-10-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 3.13.0-141-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.953GiB
Name: ip-10-0-1-106
ID: AY3M:GNS3:OB4U:PAHI:AJX2:EKXA:CEL7:6A5E:QBSO:UZE3:VIZH:YCLM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
# curl http://localhost:51678/v1/metadata
{"Cluster":"Utilities","ContainerInstanceArn":"arn:aws:ecs:eu-west-1:820506593263:container-instance/d02b5382-eb54-4de3-8464-119e7f305c39","Version":"Amazon ECS Agent - v1.17.0 (761937f7)"}
# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            996M   12K  996M   1% /dev
tmpfs           201M  384K  200M   1% /run
/dev/xvda1      7.8G  2.7G  4.7G  37% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none           1001M     0 1001M   0% /run/shm
none            100M     0  100M   0% /run/user
/dev/dm-1       9.8G   50M  9.2G   1% /var/lib/docker/devicemapper/mnt/0af5ae54ed71281a689298f440faaf70eef763377b98fbf76b188880784fadac
shm              64M     0   64M   0% /var/lib/docker/containers/eb9a52a0a6e593fd6d4a8c62a28a7209f1797b524c02e0c011b954c0918fb782/shm
adnxn commented 6 years ago

Due to security compliance and historical reasons we are running Ubuntu 14.04 as host machines.

@oleborup, this is a known limitation of cgroups in older kernels and you'll run into this on Ubuntu 14.04 LTS. See this runc issue for more information

After upgrading from 1.16.2 to 1.17.0, the agent cannot start containers.

In 1.17.0 we introduced code to enforce task level memory constraints as described here.

If task level memory and cpu limits are not required for your use case, we would recommend explicitly disabling it through ecs.config with the ECS_ENABLE_TASK_CPU_MEM_LIMIT option. This should unblock you and let the agent start containers.

Let me know if you have other questions, thanks 😄

oleborup commented 6 years ago

Thanks!

Ralf-Te commented 6 years ago

Hello, I ran into the same problem. Thanks @adnxn for the fast reply to the issue of @oleborup. However, I am a little bit confused. From referenced issue I can see that memory limit enforcement was the goal. From your answer adnxn ("If task level memory and cpu limits are not required for your use case") it sounds like not only memory limit enforcement, but also CPU limit enforcement is disabled, when setting ECS_ENABLE_TASK_CPU_MEM_LIMIT to false.

There is no documentation on the variable, so can you please clear this up here?