[ECS] Add support --oom-kill-disable docker arg

tushart91 commented 4 years ago

Summary

ECS should support the --oom-kill-disable docker flag

Description

We have an application that uses up as much memory as it can as performance is proportional to memory, so we set a hard limit on the containers in ecs, however the OOM killer kills the application as soon as the application receives an OOM, even though the application is equipped to handle OOM errors appropriately. Is there anyway we can disable the OOM killer for ECS containers which resembles the --oom-kill-disable flag in docker or can we add this feature?

brunocascio commented 3 years ago

Any update here?

spyoungtech commented 3 years ago

If you configure containers to swap, that should prevent OOM kill. You might also consider using task limits, instead of container limits.

The OOM kill behavior is dependent on aspects of the Linux kernel used, and the Kernel's response to memory pressure does not necessarily involve Docker or ECS.

From AWS blog:

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).

See references:

brunocascio commented 3 years ago

If you configure containers to swap, that should prevent OOM kill. You might also consider using task limits, instead of container limits.

The OOM kill behavior is dependent on aspects of the Linux kernel used, and the Kernel's response to memory pressure does not necessarily involve Docker or ECS.

From AWS blog:

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).

See references:

aws/amazon-ecs-agent#472 (comment)

https://stackoverflow.com/a/48618727/5747944

https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/

Is it apply to Fargate tasks as well?

mreferre commented 3 years ago

@brunocascio no it's not: The swap space container definition parameters are only supported for task definitions using the EC2 launch type

brunocascio commented 3 years ago

@brunocascio no it's not: The swap space container definition parameters are only supported for task definitions using the EC2 launch type

So, for Fargate this problem could be avoided by setting a hard limit?

mreferre commented 3 years ago

@brunocascio no it's not: The swap space container definition parameters are only supported for task definitions using the EC2 launch type

So, for Fargate this problem could be avoided by setting a hard limit?

This would need to be tested but I would guess if you only set a hard limit the process will be killed.

spyoungtech commented 3 years ago

In Fargate, you're required to set the task size (the hard limit) and you cannot overcommit resources (the container will be killed by the kernel)

As stated:

The deployment of ECS tasks on top of Fargate provides slightly less flexibility because:

you are deploying to a task of a given size (which maps 1:1 an EC2 instance of that capacity)

you cannot over-commit resources across different tasks because they have a specific size set and run on a dedicated Linux kernel (however you can still over-commit resources inside the task among containers)

you cannot create tasks that are smaller than 1/4th of a vCPU and 512 MB of memory

brunocascio commented 3 years ago

In Fargate, you're required to set the task size (the hard limit) and you cannot overcommit resources (the container will be killed by the kernel)

As stated:

The deployment of ECS tasks on top of Fargate provides slightly less flexibility because:

you are deploying to a task of a given size (which maps 1:1 an EC2 instance of that capacity)

you cannot over-commit resources across different tasks because they have a specific size set and run on a dedicated Linux kernel (however you can still over-commit resources inside the task among containers)

you cannot create tasks that are smaller than 1/4th of a vCPU and 512 MB of memory

Hey @spyoungtech, thanks for your answer.

So, is there no way to avoid oom in fargate? It's bad news for me at least. In my scenario, since npm build inject variables at runtime, I need to run it when container starts as an entrypoint. npm build (webpack under the hood) consume a lot of resources and I'm trying to avoid set 4GB of memory just to avoid oom. In this case is recommended go back to EC2 instances?

TL;DR;

In k8s there is the concept of InitContainer useful for this kind of situation. Would be great having something similar in ECS with Fargate at least.

Thanks!

NotWaste commented 6 months ago

In Fargate, you're required to set the task size (the hard limit) and you cannot overcommit resources (the container will be killed by the kernel) As stated:

The deployment of ECS tasks on top of Fargate provides slightly less flexibility because:

you are deploying to a task of a given size (which maps 1:1 an EC2 instance of that capacity)

you cannot over-commit resources across different tasks because they have a specific size set and run on a dedicated Linux kernel (however you can still over-commit resources inside the task among containers)

you cannot create tasks that are smaller than 1/4th of a vCPU and 512 MB of memory

Hey @spyoungtech, thanks for your answer.

So, is there no way to avoid oom in fargate? It's bad news for me at least. In my scenario, since npm build inject variables at runtime, I need to run it when container starts as an entrypoint. npm build (webpack under the hood) consume a lot of resources and I'm trying to avoid set 4GB of memory just to avoid oom. In this case is recommended go back to EC2 instances?

TL;DR;

In k8s there is the concept of InitContainer useful for this kind of situation. Would be great having something similar in ECS with Fargate at least.

Thanks!

‌‌Hello, I'd like to ask if the issue has been resolved. I've recently encountered a similar problem, and it's been quite bothersome.

aws / containers-roadmap

[ECS] Add support --oom-kill-disable docker arg #753

Summary

Description