canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.36k stars 931 forks source link

Jammy's kernel (5.15) doesn't play well with `cgroup1` swap #13075

Open simondeziel opened 1 year ago

simondeziel commented 1 year ago

The tests/cgroup script seems to be failing when Jammy's kernel is used together with cgroup1. @mihalicyn would you mind taking a look, please? :

# Works with Jammy's kernel and cgroup2
./bin/openstack-run jammy default tests/cgroup

...
==> Testing memory limits
+ echo ==> Testing memory limits
+ lxc config set c1 limits.memory=2GiB
+ lxc exec c1 -- grep ^MemTotal /proc/meminfo
+ [ MemTotal:        2097152 kB = MemTotal:        2097152 kB ]
+ [ -e /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes ]
+ [ -e /sys/fs/cgroup/memory.swap.max ]
+ lxc exec c1 -- grep ^SwapTotal /proc/meminfo
==> Testing process limits
+ [ SwapTotal:             0 kB = SwapTotal:             0 kB ]
# Works with Focal's kernel and cgroup1
./bin/openstack-run focal default tests/cgroup

...
==> Testing memory limits
+ echo ==> Testing memory limits
+ lxc config set c1 limits.memory=2GiB
+ lxc exec c1 -- grep ^MemTotal /proc/meminfo
+ [ MemTotal:        2097152 kB = MemTotal:        2097152 kB ]
+ [ -e /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes ]
+ [ -e /sys/fs/cgroup/memory.swap.max ]
+ lxc exec c1 -- grep ^SwapTotal /proc/meminfo
+ [ SwapTotal:             0 kB = SwapTotal:             0 kB ]
# Fails with Jammy's kernel and cgroup1
./bin/openstack-run jammy cgroup1 tests/cgroup

...
==> Testing memory limits
+ echo ==> Testing memory limits
+ lxc config set c1 limits.memory=2GiB
+ lxc exec c1 -- grep ^MemTotal /proc/meminfo
+ [ MemTotal:        2097152 kB = MemTotal:        2097152 kB ]
+ [ -e /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes ]
+ lxc exec c1 -- grep ^SwapTotal /proc/meminfo

Test failed
+ [ SwapTotal:             0 kB = SwapTotal:       2097152 kB ]
# Fails with Jammy's kernel and cgroup1
./bin/openstack-run focal virtual-hwe tests/cgroup

...
==> Testing memory limits
+ echo ==> Testing memory limits
+ lxc config set c1 limits.memory=2GiB
+ lxc exec c1 -- grep ^MemTotal /proc/meminfo
+ [ MemTotal:        2097152 kB = MemTotal:        2097152 kB ]
+ [ -e /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes ]
+ lxc exec c1 -- grep ^SwapTotal /proc/meminfo

Test failed
+ [ SwapTotal:             0 kB = SwapTotal:       2097152 kB ]
simondeziel commented 10 months ago

I get the same error with focal with swapaccount:

> ++ lxc exec c1 -- grep '^SwapTotal' /proc/meminfo
> + '[' 'SwapTotal:             0 kB' = 'SwapTotal:       2097152 kB' ']'

@mihalicyn should we stop testing with swapaccount=1 on recent kernels:

# uname -a
Linux v2 6.2.0-37-generic canonical/lxd-ci#38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov  2 18:01:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
# journalctl -kb0 --grep swapaccount=.*deprecated
Dec 11 20:19:29 v2 kernel: The swapaccount= commandline option is deprecated. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.

Note: with Jammy GA kernel (5.15), there is no such deprecation. As such, should I update the test script to refuse swapaccount=1 if the kernel is newer than 5.15?

simondeziel commented 8 months ago

I just reran the check on focal with swapaccount and it fails with both 5.0.2 and 5.0.3:

$ TEST_IMG=ubuntu-daily:22.04 ./bin/openstack-run focal swapaccount tests/cgroup 5.0/stable
...
lxd (5.0/stable) 5.0.2-d4d8da9 from Canonical** installed
Name  Version        Rev    Tracking    Publisher    Notes
lxd   5.0.2-d4d8da9  26741  5.0/stable  canonical**  -
Linux lxd-ci-cgroup-focal-5-0-stable 5.4.0-170-generic #188-Ubuntu SMP Wed Jan 10 09:51:01 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
BOOT_IMAGE=/boot/vmlinuz-5.4.0-170-generic root=UUID=15d8fa64-c6b4-4764-a034-4e56274f3a43 ro swapaccount=1 mitigations=off console=tty1 console=ttyS0

...

+ '[' -e /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes ']'
++ lxc exec c1 -- grep '^SwapTotal' /proc/meminfo

Test failed
+ '[' 'SwapTotal:             0 kB' = 'SwapTotal:       2097152 kB' ']'
+ cleanup
+ set +e
+ echo ''
+ '[' 1 = 1 ']'
+ echo 'Test failed'
+ exit 1
+ cleanup
+ set +e
+ openstack server delete lxd-ci-cgroup-focal-5-0-stable
+ rm -f /tmp/tmp.tWFt8GwAY5
+ [ 1 = 0 ]
+ echo 

+ echo ==> Test failed (cgroup)
==> Test failed (cgroup)
+ exit 1
simondeziel commented 2 months ago

cgroup1 systems are getting really old by now and Focal/20.04 is the last Ubuntu release supported with this default cgroup version.

This Ubuntu version shipped with LXD 4.0 which seems to behave just fine on cgroup1 so maybe we can close this bug and focus on cgroup2 systems going forward?

tomponline commented 2 months ago

I think we should at least try and understand why it doesn't work first. Especially as 5.0 is affected too. Then once we know the problem we can make a call as to whether its worth fixing.

@mihalicyn is this something you can help us with?