Closed scream314 closed 6 years ago
We have a similar set-up, and to be honest, we observed a similar pattern in terms of memory usage. It wasn't always like this. In fact, in the older versions of Docker, and the ECS agent, it would always sit at a stable level.
I tried investigating this a while ago, and delved into the container to see which process was taking up the memory. I couldn't quite identify what was taking up all the memory. The good news is it has not caused any OOM errors or problems.
On that note, we are quite behind in terms of our version of Electron. We could perhaps upgrade it to see if that is potentially the cause. But I'm somewhat convinced it might be a problem with the microservice itself or Docker misreporting the usage.
@MrSaints
"It wasn't always like this. In fact, in the older versions..."
I was not completely sure until now, but I also think the memory usage was lower with older versions (unfortunately cannot downgrade right now). As v2 was frozen, I updated all my containers to v2.10.0
.
The good news is it has not caused any OOM errors or problems.
Same here so far, but I could use that memory for something else.
@scream314 You can define hard memory limits on your ECS task definition.
@MrSaints I know, but if the container reaches the hard limit it gets killed ("If your container attempts to exceed the memory specified here, the container is killed." at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions), and it is something I want to avoid. I will take look at the memory usage inside the container and play around with max workers. It is set to 10 right now, but based on my workload ~5 (or even less) would be sufficient.
It looks like docker stats
is reporting wrong values for memory usage (taking buffer/cache also in account), so the actual usage is somewhat lower.
Changed max workers to 5 and will update Docker on the weekend and see if it makes any difference.
I am convinced it is "cached" memory reported as "used". In the container if I do
PIDS="$(cat /sys/fs/cgroup/memory/cgroup.procs)"; for p in ${PIDS}; do cat /proc/$p/status | grep 'VmRSS'
it gives
4 kB + 132044 kB + 22808 kB + 3240 kB + 1928 kB + 1944 kB + 1476 kB = 163444 kB
which looks fine, realistic and nice.
At the same time docker stats
says the container consumes 1.566GiB, which is a lot.
My conclusion at this point is, this high reported usage is caused by caching and it is a lie. I'll close this issue as this did not cause any issues so far and seems to be the normal behavior.
if the container reaches the hard limit it gets killed
Sorry for the lack of response @scream314. We use soft limits on memory consumption, and we noticed the microservice never really goes above it (it edges close to it).
It looks like docker stats is reporting wrong values for memory usage (taking buffer/cache also in account), so the actual usage is somewhat lower.
Yes, our prior investigations revealed the same. And it was related to an upgrade in both the ECS agent, and underlying Docker daemon (as highlighted previously).
@MrSaints Thank you for the response.
Could you please help me a little bit more and tell me what is your ECS Task soft limit, the memory consumption of the microservice container reported by docker stats
and the result of the command PIDS="$(cat /sys/fs/cgroup/memory/cgroup.procs)"; for p in ${PIDS}; do cat /proc/$p/status | grep 'VmRSS'
executed inside of the container (only the values are interesting for me, I do not need ectual PIDs or process names).
That would help a lot.
I am using
athenapdf
on AWS ECS for generating my PDFs. There are 2 round-robin balancedathenapdf
instances with the host's/dev/shm
mounted. This setup is working very well, but after a time I noticed that theathenapdf
containers are using ~1.6-1.7GB RAM each. If I restart them, they start with a much lower memory usage which increases with time, until almost all free memory is consumed on the ECS host. Theathenapdf
instances are mostly idle, I generate about 1 PDF/min, most of the PDFs are generated in 1-2 seconds.My question is
Is this the normal behavior? The memory usage feels a bit too high.