Fix ECS memory reservation.

itsdalmo commented 6 years ago

While moving a copy of the ECS modules to a new repository, I discovered that the container definitions used for ecs/service specifies memory rather than memoryReservation. I believe this to have been a mistake from mixing up the documentation for a task definition and a container definition (which is a part of task definitions):

See task definition docs here and here.
See container definition docs here.

Currently, the task_definition_ram sets memory in the container definition, which imposes a hard limit on the amount of memory that can be consumed by a running container, if the limit is exceeded it will be killed. I think this is not the intended behaviour for any users of this module, and this PR aims to fix that.

Having read the documentation, I believe what we want is the following:

memoryReservation specified in the container definition, so that we reserve some memory but allow the container to exceed the reservation.
memory and cpu specified in the task definition. The documentation does not give a lot of details about what this actually does (outside fargate), but I presume it is used by ECS to work out container placement in the cluster?

By not specifying memory on the container definition, we run the risk of having leaky applications eat up all the memory. Personally I think its better to deal with the underlying problem (fix the memory leak), than have a hard limit on memory which restarts the application whenever it reaches the hard limit. Not sure if there are any other uses for this parameter?

PS: This should be tested before it is merged, and I was hoping that perhaps @neugeeug could test it on the NEO dev/staging environment?

itsdalmo commented 6 years ago

On the other hand if the use case is for something that runs nicely in parallel (parallel build workers etc) then being able to specify a memory limit that is a whole fraction of a total instance gives you the best chance of maximising the use of your resources

But the memory parameter in the container definition will outright kill the instance if it exceeds the limit? If we use that parameter on all our tasks, we'll have to make sure that the memory for all things deployed to the cluster add up to the total memory of the cluster, or we'll be enforcing an under-utilisation?

Relevant quote from the documentation:

The hard limit (in MiB) of memory to present to the container. If your container attempts to exceed the memory specified here, the container is killed. This parameter maps to Memory in the Create a container section of the Docker Remote API and the --memory option to docker run.

Edit: Another relevant quote:

For containers that will be part of a task using the EC2 launch type, you must specify a non-zero integer for one or both of memory or memoryReservation in container definitions. If you specify both, memory must be greater than memoryReservation. If you specify memoryReservation, then that value is subtracted from the available memory resources for the container instance on which the container is placed; otherwise, the value of memory is used.

With the last quote in mind, I think we can skip the task-level cpu and memory reservations. The cpu and memoryReservation of the container definition (of which there is only one) will be enough for ECS to schedule containers correctly, and both are soft limits that also tries to balance usage between multiple containers on the same instance.

Last edit: Or we can do it the other way around, only use task level limits since we only have a single container definition. I think those parameters equal cpu and memoryReservation in the container definition.

colincoleman commented 6 years ago

For java programs you can control the total memory either in the startup parameters or in the manifest file. When set Java regulates its garbage collection to stay under the limit you choose. (Of course if you set this too low then the program performs terribly!)

TeliaSoneraNorge / telia-terraform-modules

Fix ECS memory reservation. #78