aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.07k stars 606 forks source link

MemoryUtilization includes kernel caches #3594

Closed luhn closed 1 year ago

luhn commented 1 year ago

Summary

Cloudwatch is reporting what looks to be a memory leak in my ECS task. MemoryUtilization has been rising continually since the last deployment and currently sits at 330% with no sign of stopping.

Pasted Graphic

ContainerInsight corroborates this, reporting that my app container is using 990MB.

However, memory usage on the entire host is only 441MB and has been stable. So the number ECS is reporting cannot be accurate.

[ec2-user@ip-10-0-0-108 ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           1954         441         247          10        1264        1329
Swap:             0           0           0

What's happening is that MemoryUtilization is including kernel slabs, notably dentry. Every time a file is created, information is saved in the dentry cache, but is not cleared when the file is deleted. So applications like mine that create many short-lived files, dentry can inflate to a massive size.

This unfortunately makes MemoryUtilization meaningless and leaves me with no insight into the memory usage of my containers.

Description

As mentioned above, ContainerInsights reports 990MB.

@timestamp  
1677516240000
ClusterName 
pos
ContainerInstanceId 
6963edae9ae74236a5127d57bba779ad
ContainerKnownStatus    
RUNNING
ContainerName   
app
CpuReserved 
0.0
CpuUtilized 
39.30253030341968
EC2InstanceId   
i-0be30281ecc6ad325
Image   
[REDACTED]
MemoryReserved  
256
MemoryUtilized  
990
NetworkRxBytes  
67874
NetworkRxDropped    
0
NetworkRxErrors 
0
NetworkRxPackets    
89263702
NetworkTxBytes  
55667
NetworkTxDropped    
0
NetworkTxErrors 
0
NetworkTxPackets    
81569942
ServiceName 
exterminator
StorageReadBytes    
3691008
StorageWriteBytes   
90112
TaskDefinitionFamily    
exterminator
TaskDefinitionRevision  
29
TaskId  
5077b121b4754591a8665be324f83e6c
Timestamp   
1677516240000
Type    
Container

Docker stats also reports this this. (This shows 1108MB because it was run a few hours later.)

[ec2-user@ip-10-0-0-108 ~]$ docker stats --no-stream
CONTAINER ID   NAME                                                 CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
28d2df0aa8e0   ecs-exterminator-29-proxy-f6ddeed6fffdaca97200       0.64%     18.99MiB / 1.908GiB   0.97%     63.3GB / 66.6GB   456kB / 0B        2
f25307353a3b   ecs-exterminator-29-app-acd3b998add1b1a88801         4.50%     1008MiB / 1.908GiB    51.56%    60.4GB / 51.2GB   4.53MB / 90.1kB   16
e42429bee55a   ecs-exterminator-29-forwarder-82fcccd1ea82f1c95a00   0.81%     57.24MiB / 1.908GiB   2.93%     3.13GB / 2.18GB   545kB / 0B        12
09f811287d6e   ecs-agent                                            0.17%     17.32MiB / 1.908GiB   0.89%     0B / 0B           118MB / 6.36MB    12

However, host memory use is only 440MB.

[ec2-user@ip-10-0-0-108 ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           1954         441         247          10        1264        1329
Swap:             0           0           0

If we look into the containers memory.stat, we can see RSS is 158m (about what I would expected) with cache and inactive_files and others showing modest amounts that would not account for the discrepancy.

[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.stat

cache 2899968
rss 158834688
rss_huge 0
shmem 0
mapped_file 135168
dirty 135168
writeback 0
swap 0
pgpgin 7420380
pgpgout 7380871
pgfault 77616
pgmajfault 33
inactive_anon 0
active_anon 158777344
inactive_file 2658304
active_file 286720
unevictable 0
hierarchical_memory_limit 9223372036854771712
hierarchical_memsw_limit 9223372036854771712
total_cache 2899968
total_rss 158834688
total_rss_huge 0
total_shmem 0
total_mapped_file 135168
total_dirty 135168
total_writeback 0
total_swap 0
total_pgpgin 7420380
total_pgpgout 7380871
total_pgfault 77616
total_pgmajfault 33
total_inactive_anon 0
total_active_anon 158777344
total_inactive_file 2658304
total_active_file 286720
total_unevictable 0

memory.usage_in_bytes shows a very large value. I believe ECS takes usage_in_bytes - cache, so that's where our inflated value is coming from.

[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.usage_in_bytes
1059020800

If we look at kmem use, we can see that it's extremely high, which I believe accounts for the discrepancy.

[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
897204224

And if we break that down we can see that dentry is absolutely massive.

[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.kmem.slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
kmalloc-96            42     42     96   42    1 : tunables    0    0    0 : slabdata      1      1      0
radix_tree_node      151    168    584   14    2 : tunables    0    0    0 : slabdata     12     12      0
xfs_inode            259    306    960   17    4 : tunables    0    0    0 : slabdata     18     18      0
kmalloc-64           320    320     64   64    1 : tunables    0    0    0 : slabdata      5      5      0
kmalloc-8           1024   1024      8  512    1 : tunables    0    0    0 : slabdata      2      2      0
ovl_inode            532    805    696   23    4 : tunables    0    0    0 : slabdata     35     35      0
kmalloc-1024          32     32   1024   16    4 : tunables    0    0    0 : slabdata      2      2      0
kmalloc-192           42     42    192   21    1 : tunables    0    0    0 : slabdata      2      2      0
inode_cache           26     26    616   13    2 : tunables    0    0    0 : slabdata      2      2      0
mqueue_inode_cache      0      0    960   17    4 : tunables    0    0    0 : slabdata      0      0      0
pid                   64     64    128   32    1 : tunables    0    0    0 : slabdata      2      2      0
signal_cache          32     32   1024   16    4 : tunables    0    0    0 : slabdata      2      2      0
sighand_cache         30     30   2112   15    8 : tunables    0    0    0 : slabdata      2      2      0
files_cache           46     46    704   23    4 : tunables    0    0    0 : slabdata      2      2      0
task_struct           23     23  11520    1    4 : tunables    0    0    0 : slabdata     23     23      0
sock_inode_cache      69     69    704   23    4 : tunables    0    0    0 : slabdata      3      3      0
kmalloc-512           32     32    512   16    2 : tunables    0    0    0 : slabdata      2      2      0
kmalloc-256           32     32    256   16    1 : tunables    0    0    0 : slabdata      2      2      0
mm_struct             32     32   2048   16    8 : tunables    0    0    0 : slabdata      2      2      0
shmem_inode_cache     66     66    728   22    4 : tunables    0    0    0 : slabdata      3      3      0
proc_inode_cache      92     92    688   23    4 : tunables    0    0    0 : slabdata      4      4      0
dentry            4582515 4582515    192   21    1 : tunables    0    0    0 : slabdata 218215 218215      0
vm_area_struct      1860   1860    200   20    1 : tunables    0    0    0 : slabdata     93     93      0
cred_jar             210    210    192   21    1 : tunables    0    0    0 : slabdata     10     10      0
anon_vma             780    780    104   39    1 : tunables    0    0    0 : slabdata     20     20      0
anon_vma_chain      1536   1536     64   64    1 : tunables    0    0    0 : slabdata     24     24      0

And finally, if we clear the caches (echo 3 | sudo tee /proc/sys/vm/drop_caches) , memory usage drops from several hundred percent to about 70%, proving that it is indeed a kernel cache that is inflating MemoryUtilization.

Environment Details

t3.small running Amazon Linux 2 (amzn2-ami-ecs-hvm-2.0.20230214-x86_64-ebs ami-0ae546d2dd33d2039), ECS Agent 1.68.2

(This was initially observed on Fargate but I switched to EC2 to facilitate debugging.)

docker info output:

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 4
  Running: 4
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: 5fd4c4d144137e991c4acebb2146ab1483a97925
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.14.301-224.520.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.908GiB
 Name: ip-10-0-0-108.us-west-2.compute.internal
 ID: 6L62:MSTB:SIPL:L65U:PCMR:2FYH:JZR7:MRMS:IP53:56ZU:NQUH:4FNK
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Prior art

danehlim commented 1 year ago

Hi, thank you for reporting this! I am looking into it now.

danehlim commented 1 year ago

As you mentioned, ECS Agent currently makes use of docker stats in its calculation of the value it sends to Cloudwatch to report as MemoryUtilization and docker stats reports an inflated memory usage (in bytes) value. Since this enhancement is already being tracked as an issue in the docker cli repo, I will close this issue in favor of that one to avoid duplicated and potentially divergent efforts. Please feel free to reach out should you have any additional concerns or information.

luhn commented 1 year ago

I've found that setting up /tmp as a bind mount prevents dentry from inflating, both on EC2 and Fargate.

seanlinsley commented 11 months ago

@luhn could you share what exactly you mean by "setting up /tmp as a bind mount"? The default with an Ubuntu image seems to treat /tmp as a standard directory on the root filesystem. Was that the case for you previously, and what are you doing differently now?


df /tmp/

Filesystem     1K-blocks     Used Available Use% Mounted on
overlay         30787492 11544624  17653620  40% /
luhn commented 11 months ago

I added a volume to the task and set a mountpoint with containerPath: "/tmp" on my container. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bind-mounts.html#specify-volume-config

seanlinsley commented 10 months ago

Our workload makes heavy use of tempfiles, like @luhn. Happily the /tmp mount hack worked!

But I wonder if AWS should research better default settings to vm.vfs_cache_pressure, or at least make it configurable for Fargate tasks.

// CDK config for a Fargate task
task.addVolume({ name: 'tmp' });
task.defaultContainer!.addMountPoints({ sourceVolume: 'tmp', containerPath: '/tmp', readOnly: false });
# Dockerfile changes to fix the 0755 permissions
RUN mkdir -p /tmp && chmod 1777 /tmp
VOLUME ["/tmp"]
Screenshot 2023-11-09 at 9 50 51 AM