canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

`limits.memory` are not set correctly in cgroup2 #13147

Closed mszywala closed 7 months ago

mszywala commented 8 months ago

Required information

Issue description

We are running multiple system containers and recently experienced freezing of whole containers, when they were reaching/exceeding memory.high. After reading through: https://docs.kernel.org/admin-guide/cgroup-v2.html. I noticed the following:

memory.high A read-write single value file which exists on non-root cgroups. The default is "max".

Memory usage throttle limit. If a cgroup's usage goes over the high boundary, the processes of the cgroup are throttled and put under heavy reclaim pressure.

Going over the high limit never invokes the OOM killer and under extreme conditions the limit may be breached. The high limit should be used in scenarios where an external process monitors the limited cgroup to alleviate heavy reclaim pressure.

Because the OOM killer is never invoked, containers start to freeze.

There was an issue that caused a change to using memory.high instead of memory.low: https://github.com/canonical/lxd/issues/11239 The changes from using memory.low to using memory.high seem not to be a great default. https://github.com/canonical/lxd/commit/d573ae013cdad459204eb60314be7b67b8f62bcd As far as I understand, regarding the issue above, it should be implemented differently than it currently is. Maybe another approach could also solve the issue.

# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.max
2147483648
# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.high
1932734464

As I understand, the issue above (https://github.com/canonical/lxd/issues/11239) wanted to archive the following for limits.memory.enforce=hard (the default):

# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.low
1932734464
# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.high
max
# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.max
2147483648

And for limits.memory.enfoce=soft this:

# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.low
2147483648
# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.high
max
# cat /sys/fs/cgroup/lxc.payload.s-c-1-srv-a/memory.max
max

Steps to reproduce

  1. Create an Ubuntu 22.04 Container
  2. Run an application that consumes so much RAM that it exceeds the limit of memory.high (For example, a Java application)
  3. The container becomes slow/unresponsive and freezes after a few seconds/minutes. Depending on the workload.

Information to attach

tomponline commented 8 months ago

@mihalicyn please can you investigate this and confirm if its an issue in the latest/edge and 5.0/edge channels? Thanks

mihalicyn commented 7 months ago

Hi @mszywala

Thanks for your report!

Do I understand correctly, that you using limits.memory.enforce=soft for your containers?

Because the OOM killer is never invoked, containers start to freeze.

If you want OOM to be invoked, you need to use limits.memory.enforce=hard which will set memory.max limit.

Speaking about setting memory.low we just ignore this knob completely in LXD and it is not about soft limits at all. This soft/hard limit semantics comes from cgroup-v1 era, where we had explicit soft and hard limit knobs (memory.soft_limit_in_bytes (https://github.com/torvalds/linux/blob/a4145ce1e7bc247fd6f2846e8699473448717b37/mm/memcontrol.c#L4087) and memory.limit_in_bytes). In cgroup-v2 we don't have an explicit analog of soft limit at all. So, it looks reasonable to use memory.high as a soft limit, so your process won't be killed by OOM but will be forced to reclaim every time when pagefault occurs on allocation. At the same time, for hard limit we just use memory.max (https://github.com/torvalds/linux/blob/a4145ce1e7bc247fd6f2846e8699473448717b37/mm/memcontrol.c#L6822) which is a clear analog of memory.limit_in_bytes (https://github.com/torvalds/linux/blob/a4145ce1e7bc247fd6f2846e8699473448717b37/mm/memcontrol.c#L3699).

mszywala commented 7 months ago

Hi @mihalicyn,

Thanks for your detailed response.

Actually, we do not set limits.memory.enforce, so the default (https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#resource-limits) should be used, which is limits.memory.enforce=hard

I am expecting that the OOM-Killer will be triggered when the container reaches limits.memory. But instead, the container begins to freeze. It gets throttled because memory.high is also set, which causes the container to freeze.

I tried both configurations. limits.memory.enforce=hard seems to set memory.high=<value below memory.max> and memory.max=<limits.memory of container>.

limits.memory.enforce=soft as I understand, seems to work as intended and sets memory.high=<limits.memory of container> and memoy.max=max.

When I understand you correctly, then limits.memory.enforce=hard (the default) should only set memory.max, but this is not the case. I would expect that memory.high is set to the cgroup2 default: max

tomponline commented 7 months ago

@mihalicyn do you think this is a bug? Thanks

mihalicyn commented 7 months ago

Actually, we do not set limits.memory.enforce, so the default (https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#resource-limits) should be used, which is limits.memory.enforce=hard.

yes, it means that you effectively have limits.memory.enforce=hard

I am expecting that the OOM-Killer will be triggered when the container reaches limits.memory.

when hard is set it should invoke OOM-killer, that's right

I would expect that memory.high is set to the cgroup2 default: max

You are right. I'll fix it.

Upd: comment was edited by me.

mszywala commented 7 months ago

Thank you @mihalicyn. In which LXD LTS version will the bug be fixed? In the new 5.21.0 LTS Release?

tomponline commented 7 months ago

Yes ill include this when we move 5.21 into stable.