[ECS] [request]: Support ephemeral storage limits for EC2 launch type

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request Allow limiting the ephemeral storage used by a task through the task definition, similar to how you can limit the amount of memory available to a task.

Which service(s) is this request for? ECS (EC2)

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? My team operates large multi-tenant clusters. Because there's no limit on the amount of ephemeral space each individual task can consume, it sometimes happens that a single task ends up consuming lots of disk space by excessively writing crash dumps or logs or something. Because we don't have any insight into how much disk space has been used by an individual task, we have no choice in these cases but terminate the entire container instance rather than just evicting the offending task. This affects all the other workloads running on the same instance, but is better than running out of disk space and having all workloads stop working.

If instead we could impose a limit on the amount of ephemeral storage that is used by a task, and stop it if it exceeds the limit, we would not have to impact other "innocent" workloads and we would not need to maintain automation to drain container instances that are starting to run out of free disk space (which by itself is also not all that simple, since EBS doesn't provide metrics for this).

Are you currently working around this issue? We have to collect and publish disk utilization metrics to our observability platform, create monitors that trigger when the available space on the disk reaches a certain threshold and trigger a container instance draining process that terminates the container instance after relocating all the running tasks.

Additional context The equivalent functionality in kubernetes is ephemeral storage limits & requests. I am personally mostly interested in the limits, rather than the requests, although both would of course be useful to avoid placing a new task on a host that doesn't have enough available ephemeral space.

aws / containers-roadmap

[ECS] [request]: Support ephemeral storage limits for EC2 launch type #2442

Community Note