OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.26k stars 485 forks source link

Virtual machine in S3/S4 power mode are detected as SUSPENDED #5793

Closed baby-gnu closed 1 year ago

baby-gnu commented 2 years ago

Description

When a VM has the power management tools and ask to suspend, it pass as pmsuspended in virsh list output.

Unfortunately, it can't be resumed by OpenNebula ``` Mon Apr 4 10:52:23 2022 [Z0][LCM][I]: Restoring VM Mon Apr 4 10:52:23 2022 [Z0][VM][I]: New state is ACTIVE Mon Apr 4 10:52:23 2022 [Z0][VM][I]: New LCM state is BOOT_SUSPENDED Mon Apr 4 10:52:23 2022 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context. Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: ExitCode: 0 Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Successfully execute network driver operation: pre. Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Command execution fail (exit code: 1): cat << EOT | /var/tmp/one/vmm/kvm/restore '/var/lib/one//datastores/0/802559/checkpoint' 'nebula80' '262a55de-3e36-4507-b928-7aeb7d2611c3' 802559 nebula80 Mon Apr 4 10:52:24 2022 [Z0][VMM][E]: restore: Command "set -e -o pipefail Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: # extract the xml from the checkpoint Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: virsh --connect qemu+tls://localhost/system save-image-dumpxml /var/lib/one//datastores/0/802559/checkpoint > /var/lib/one//datastores/0/802559/checkpoint.xml Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: # Eeplace all occurrences of the DS_LOCATION// with the specific Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: # DS_ID where the checkpoint is placed. This is done in case there was a Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: # system DS migration Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: sed -i "s%/var/lib/one//datastores/[0-9]\+/802559/%/var/lib/one//datastores/0/802559/%g" /var/lib/one//datastores/0/802559/checkpoint.xml Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: sed -i "s%/var/lib/one/datastores/[0-9]\+/802559/%/var/lib/one//datastores/0/802559/%g" /var/lib/one//datastores/0/802559/checkpoint.xml" failed: error: Failed to open file '/var/lib/one//datastores/0/802559/checkpoint': Aucun fichier ou dossier de ce type Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Could not recalculate paths in /var/lib/one//datastores/0/802559/checkpoint.xml Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: ExitCode: 1 Mon Apr 4 10:52:24 2022 [Z0][VMM][I]: Failed to execute virtualization driver operation: restore. Mon Apr 4 10:52:24 2022 [Z0][VMM][E]: RESTORE: ERROR: restore: Command "set -e -o pipefail # extract the xml from the checkpoint virsh --connect qemu+tls://localhost/system save-image-dumpxml /var/lib/one//datastores/0/802559/checkpoint > /var/lib/one//datastores/0/802559/checkpoint.xml # Eeplace all occurrences of the DS_LOCATION// with the specific # DS_ID where the checkpoint is placed. This is done in case there was a # system DS migration sed -i "s%/var/lib/one//datastores/[0-9]\+/802559/%/var/lib/one//datastores/0/802559/%g" /var/lib/one//datastores/0/802559/checkpoint.xml sed -i "s%/var/lib/one/datastores/[0-9]\+/802559/%/var/lib/one//datastores/0/802559/%g" /var/lib/one//datastores/0/802559/checkpoint.xml" failed: error: Failed to open file '/var/lib/one//datastores/0/802559/checkpoint': Aucun fichier ou dossier de ce type Could not recalculate paths in /var/lib/one//datastores/0/802559/checkpoint.xml ExitCode: 1 Mon Apr 4 10:52:24 2022 [Z0][VM][I]: New state is SUSPENDED Mon Apr 4 10:52:24 2022 [Z0][VM][I]: New LCM state is LCM_INIT ```

To Reproduce

  1. start a virtual machine
  2. execute virsh dompmsuspend --target mem one-XXXX
  3. wait for OpenNebula to detect the new VM state
  4. try to resume the virtual machine with onevm resume XXXX

Expected behavior

The virtual machine should be resumed with virsh dompmwakeup one-XXXX

Details

Additional context Add any other context about the problem here.

Progress Status

defekkt commented 1 year ago

Same issue on 6.6

brodriguez-opennebula commented 1 year ago

OK, checked some virsh internals. Suspending the VM to S3 or S4 states depends on the default settings of the following options (opennebula doesn't set them):

  <pm>
    <suspend-to-disk enabled='yes'/>
    <suspend-to-mem enabled='yes'/>
  </pm>

If they are enabled

$ virsh list
 Id   Name     State
----------------------------
 8    one-57   pmsuspended

opennebula normally suspends the VMs with a virsh suspend one-$VMID (state paused) and resumes them with virsh resume one-$VMID. That doesn't rely on the power management of the relying domain (it could work with other hypervisors). In this case we have

$ virsh list
 Id   Name     State
-----------------------
 8    one-57   paused
onenhansen commented 1 year ago

Suspending the VM on the host:

[root@alma8-kvm-qcow2-6-7-05dhw-1 qemu]# virsh dompmsuspend --domain one-6 --target mem
Domain 'one-6' successfully suspended
[root@alma8-kvm-qcow2-6-7-05dhw-1 qemu]# virsh list
 Id   Name    State
---------------------------
 10   one-6   pmsuspended

Checking status on frontend and resuming with updates:

[oneadmin@alma8-kvm-qcow2-6-7-05dhw-0 root]$ onevm list --no-expand
  ID USER     GROUP    NAME            STAT  CPU     MEM HOST             TIME
   6 oneadmin oneadmin test            susp    1    768M alma8-kvm-   0d 00h11
[oneadmin@alma8-kvm-qcow2-6-7-05dhw-0 root]$ onevm show 6 | grep STATE
STATE               : SUSPENDED           
LCM_STATE           : LCM_INIT            
[oneadmin@alma8-kvm-qcow2-6-7-05dhw-0 root]$ onevm resume 6
[oneadmin@alma8-kvm-qcow2-6-7-05dhw-0 root]$ onevm show 6 | grep STATE
STATE               : ACTIVE              
LCM_STATE           : RUNNING