OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.25k stars 483 forks source link

libvirtd restarts in cycles each 10 minutes with error message in system logs #6463

Open mkutouski opened 10 months ago

mkutouski commented 10 months ago

Description libvirtd on the hypervisor restarts every 10 minutes under the user 'oneadmin,' while there is already a process running under the root user.

root 8590 0.6 0.0 1632864 47432 ? Ssl 10:22 0:09 /usr/sbin/libvirtd
oneadmin 63387 0.0 0.0 1547460 38376 ? Sl 10:46 0:00 /usr/sbin/libvirtd --timeout=120

When the process is initiated as 'oneadmin,' the following message also appears in the syslog.

Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: libvirt version: 8.0.0, package: 1ubuntu7.7 (Michal Maloszewski <[michal.maloszewski@canonical.com](mailto:michal.maloszewski@canonical.com)> Fri, 04 Aug 2023 10:42:25 +0200)
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: hostname: cloud-testbed001
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to open file '/sys/kernel/security/apparmor/profiles': Permission denied
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to read AppArmor profiles list '/sys/kernel/security/apparmor/profiles': Permission denied
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to open file '/sys/kernel/security/apparmor/profiles': Permission denied
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to read AppArmor profiles list '/sys/kernel/security/apparmor/profiles': Permission denied
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to open a VPD file '/sys/bus/pci/devices/0000:45:00.0/vpd': Operation not permitted
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to open a VPD file '/sys/bus/pci/devices/0000:45:00.1/vpd': Operation not permitted
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to open a VPD file '/sys/bus/pci/devices/0000:45:00.2/vpd': Operation not permitted
Jan 4 10:46:57 cloud-testbed001 libvirtd[63387]: Failed to open a VPD file '/sys/bus/pci/devices/0000:45:00.3/vpd': Operation not permitted

To Reproduce Install latest 6.8.x OpenNebula version (e.g. via minione) and check on hypervisor node a system logs for error messages as above.

Expected behavior No error messages should be like ones listed in that issue.

Details

Additional context https://forum.opennebula.io/t/libvirtd-starts-in-cycles-of-10-minutes/11733

Progress Status

juulsp commented 3 months ago

I would like to add a few things I noticed.

We started seeing this behaviour, the multiple libvirtd processes, after upgrading OpenNebula from 6.6.x to 6.8.x, the behaviour described here seems to be a correct out of the box experience using libvirtd with systemd on Ubuntu 22.04 and also 24.04 when accessing libvirt as oneadmin or any other user for that matter.

If you access libvirt, with oneadmin or any user for that matter, a libvirtd process is spawned. Below an example on what you see if userX does a simple virsh list:

host4:~# ps uax|grep libvirtd
root     4007562  5.8  0.0 6575520 46688 ?       Ssl  jun11 3818:44 /usr/sbin/libvirtd

host4:~# sudo -u userX virsh list --all
 Id   Name   State
--------------------
root@host4:~# ps uax|grep libvirtd 
userX    2697238 49.0  0.0 1547440 27648 ?       Sl   10:47   0:00 /usr/sbin/libvirtd --timeout=120
root     4007562  5.8  0.0 6575520 46688 ?       Ssl  jun11 3818:44 /usr/sbin/libvirtd

Though I have not deepdived into libvirt on Ubuntu to see if this is the behaviour they want it too be, it is the behaviour that comes out of the box on 22.04 and 24.04, older versions I have not checked. A small part from the config, /etc/default/libvirtd that is installed by the package libvirt-daemon-system, seems to indicate it is expected bahviour:

# The default upstream behavior is for libvirtd.service to                                  
# start on boot, perform VM autostart and shutdown again if                                            
# nothing was started; later on, systemd socket activation                                             
# is used to start it again when some client app connects.

To figure out then why we are seeing 'restarts' in a 10 minute interval the perception might have better been switched to who are what is accessing libvirt on a 10 min interval as user oneadmin. From there we quickly come to the SYSTEM_HOST interval for the monitor probe. When I configure the SYSTEM_HOST interval down from 600 to less 120 in the monitord config, the process keeps running as it will not trigger the default timeout of 120 seconds.

I then guessed that pre 6.8.x sudo was always used in the probes and since 6.8.x somewhere a command is ran without sudo.

Looking into the relation of the timeout with the SYSTEM_HOST interval I suspected the cause being in the 'im/kvm-probes.d/host/system/cpu_features.sh' where the script contains:

FEATURES=$(virsh capabilities | grep '<feature name' | sed -e "s/^.*='//;s/'\/>$//" | xargs | tr ' ' ',')

This script is not present in 6.6. If anyone wants to get rid of the multiple libvirtd processes you can add a sudo entry for the command on the hypervisor and update the command in the probe to make use it. But in the end it doesn't really seem to be an issue, just a change in behaviour between 6.6 en 6.8 which got noticed because of the oneadmin user running some virsh command now. The moment you will run anything as userX or userY you will get extra libvirtd processes and see the same thing.

Franco-Sparrow commented 3 months ago

Hi team

We had same issue with libvirtd 8.0.0 (8.0.0-1ubuntu7.6). There is a a new compilation of the package available on ubuntu 22.04 repos (8.0.0-1ubuntu7.10) and apparently the release notes said something about fixing this issue. We will watch this cluster where we had to upgrade the libvirtd and share any information if issue comes back again.

dcarracedo commented 4 days ago

Add a note to the known issues