Closed luhn closed 1 year ago
Hi, thank you for reporting this! I am looking into it now.
As you mentioned, ECS Agent currently makes use of docker stats in its calculation of the value it sends to Cloudwatch to report as MemoryUtilization and docker stats reports an inflated memory usage (in bytes) value. Since this enhancement is already being tracked as an issue in the docker cli repo, I will close this issue in favor of that one to avoid duplicated and potentially divergent efforts. Please feel free to reach out should you have any additional concerns or information.
I've found that setting up /tmp
as a bind mount prevents dentry from inflating, both on EC2 and Fargate.
@luhn could you share what exactly you mean by "setting up /tmp as a bind mount"? The default with an Ubuntu image seems to treat /tmp
as a standard directory on the root filesystem. Was that the case for you previously, and what are you doing differently now?
df /tmp/
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 30787492 11544624 17653620 40% /
I added a volume to the task and set a mountpoint with containerPath: "/tmp"
on my container. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bind-mounts.html#specify-volume-config
Our workload makes heavy use of tempfiles, like @luhn. Happily the /tmp
mount hack worked!
But I wonder if AWS should research better default settings to vm.vfs_cache_pressure, or at least make it configurable for Fargate tasks.
// CDK config for a Fargate task
task.addVolume({ name: 'tmp' });
task.defaultContainer!.addMountPoints({ sourceVolume: 'tmp', containerPath: '/tmp', readOnly: false });
# Dockerfile changes to fix the 0755 permissions
RUN mkdir -p /tmp && chmod 1777 /tmp
VOLUME ["/tmp"]
Summary
Cloudwatch is reporting what looks to be a memory leak in my ECS task. MemoryUtilization has been rising continually since the last deployment and currently sits at 330% with no sign of stopping.
ContainerInsight corroborates this, reporting that my
app
container is using 990MB.However, memory usage on the entire host is only 441MB and has been stable. So the number ECS is reporting cannot be accurate.
What's happening is that MemoryUtilization is including kernel slabs, notably dentry. Every time a file is created, information is saved in the dentry cache, but is not cleared when the file is deleted. So applications like mine that create many short-lived files, dentry can inflate to a massive size.
This unfortunately makes MemoryUtilization meaningless and leaves me with no insight into the memory usage of my containers.
Description
As mentioned above, ContainerInsights reports 990MB.
Docker stats also reports this this. (This shows 1108MB because it was run a few hours later.)
However, host memory use is only 440MB.
If we look into the containers
memory.stat
, we can see RSS is 158m (about what I would expected) withcache
andinactive_files
and others showing modest amounts that would not account for the discrepancy.memory.usage_in_bytes
shows a very large value. I believe ECS takesusage_in_bytes - cache
, so that's where our inflated value is coming from.If we look at kmem use, we can see that it's extremely high, which I believe accounts for the discrepancy.
And if we break that down we can see that
dentry
is absolutely massive.And finally, if we clear the caches (
echo 3 | sudo tee /proc/sys/vm/drop_caches
) , memory usage drops from several hundred percent to about 70%, proving that it is indeed a kernel cache that is inflating MemoryUtilization.Environment Details
t3.small running Amazon Linux 2 (amzn2-ami-ecs-hvm-2.0.20230214-x86_64-ebs
ami-0ae546d2dd33d2039
), ECS Agent 1.68.2(This was initially observed on Fargate but I switched to EC2 to facilitate debugging.)
docker info output:
Prior art
280 reported unusual memory usage, fixed in #582 by subtracting
memory.stat.cache