canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 929 forks source link

Offline/unplugged CPUs are showing in container metrics when using `cgroup1` #13324

Open simondeziel opened 7 months ago

simondeziel commented 7 months ago

With a cgroup1 VM with a single CPU (implied default limits.cpu=1), its guest instances are apparently seeing the other CPU cores that are "hotpuggable" in the VM:

sdeziel@sdeziel-lemur:~$ nproc
12
$ lxc exec v1 -- nproc
1
root@v1:~# nproc
1
root@v1:~# lxc query /1.0/metrics | grep ^lxd_cpu_seconds
lxd_cpu_seconds_total{cpu="0",mode="system",name="a1",project="default",type="container"} 0.150751482
lxd_cpu_seconds_total{cpu="0",mode="user",name="a1",project="default",type="container"} 0.043823309
lxd_cpu_seconds_total{cpu="2",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="2",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="7",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="7",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="1",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="1",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="user",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="system",name="a1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="user",name="a1",project="default",type="container"} 0

Here's how to reproduce:

lxc launch ubuntu-daily:22.04 --vm v1
lxc exec v1 -- sed -i 's/console=ttyS0"/console=ttyS0 systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub.d/50-cloudimg-settings.cfg
lxc exec v1 -- update-grub
lxc restart v1
lxc exec v1 -- lxd init --auto
lxc exec v1 -- lxc launch ubuntu-minimal:22.04 c1
lxc exec v1 -- lxc query /1.0/metrics | grep ^lxd_cpu_seconds

The metrics query should only report about cpu="0" but it reports 0 for other CPU cores that are not online/plugged:

$ lxc exec v1 -- lxc query /1.0/metrics | grep ^lxd_cpu_seconds
lxd_cpu_seconds_total{cpu="7",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="7",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="8",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="9",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="11",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="2",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="2",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="4",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="6",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="5",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="10",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="0",mode="system",name="c1",project="default",type="container"} 1.510519089
lxd_cpu_seconds_total{cpu="0",mode="user",name="c1",project="default",type="container"} 3.571449891
lxd_cpu_seconds_total{cpu="1",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="1",mode="user",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="system",name="c1",project="default",type="container"} 0
lxd_cpu_seconds_total{cpu="3",mode="user",name="c1",project="default",type="container"} 0

$ nproc
12
$ lxc exec v1 -- nproc
1

$ lxc exec v1 -- snap list lxd
Name  Version        Rev    Tracking      Publisher   Notes
lxd   5.0.3-babaaf8  27948  5.0/stable/…  canonical✓  -

FYI, this is reproducible with 5.0/stable, 5.21/stable and latest/edge.

tomponline commented 7 months ago

@mihalicyn is this expected?

@simondeziel how is the behavior different in cgroupv2?

simondeziel commented 7 months ago

@simondeziel how is the behavior different in cgroupv2?

With cgroup2 (default with 22.04, maybe 20.04 too?) only cpu="0" is reported about which seems to be expected https://github.com/canonical/lxd/blob/main/lxd/cgroup/abstraction.go#L340-L341 and cpu="0" is always online.

tomponline commented 1 month ago

@simondeziel @mihalicyn please can you chat about this and figure out if we need to do anything here?

mihalicyn commented 1 month ago

https://lore.kernel.org/all/20241017102138.92504-1-aleksandr.mikhalitsyn@canonical.com