canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

Creating a VM snapshot with insufficient LVM storage resources makes deleting VM instance impossible #11431

Closed gabrielmougard closed 1 year ago

gabrielmougard commented 1 year ago

Required information


# Issue description

You have a VM instance with an LVM storage backend. You are attempting to snapshot the instance, but if there is not enough storage resources in the pool, the snapshot operation will fail, which is normal (the error should be similar to `Error: Create instance snapshot: Error creating LVM logical volume snapshot: Failed to run: lvcreate -n virtual-machines_<vm_name>-<snap_name> -s /dev/vmpool127383/virtual-machines_<vm_name> --setactivationskip y -l 100%ORIGIN -pr: exit status 5 (Insufficient suitable allocatable extents found for logical volume virtual-machines_<vm_name>-<snap_name>.)
` ). However, trying to delete the instance after this unsuccessful snapshot with `lxc delete <instance>` (even with the `-f` flag) will fail too, which isn't an expected behavior. On a side note, this `delete` operation took an unusual amount of time of my machine but it finished eventually with this error : 

Error: Error deleting storage volume: Error removing LVM logical volume: Failed to run: lvremove -f /dev/vmpool127383/virtual-machines_.block: exit status 5 (Logical volume vmpool127383/virtual-machines_.block in use.)


The workaround is to use the storage API to manually delete the volume (you also might want to kill the vm's QEMU process if it's still alive) like so : `lxc storage volume delete <pool> virtual-machine/<vm_name>` and then trying to do `lxc delete <vm_name>` worked on my side.. 

# Steps to reproduce

1) Create the LVM storage pool with a not so large size

lxc storage create pool1 lvm size=12GiB lvm.use_thinpool=false

2) Launch a VM

lxc init images:ubuntu/jammy v1 --vm -s pool1 lxc config device set v1 root size.state=4GiB lxc start v1

3) Creating the snapshot (if you don't get the error here, try to adjust the size of the storage pool or the size of the VM, you can use the lvs and vgs commands to see how much storage the LVM partition consumes)

lxc snapshot v1 snap0



# Proposed idea to solve the issue

When the failed snapshot occurs, LXD should be able to "revert" (a `reverter` might be missing at that stage) this operation to free up the corrupted resources in the storage pool so that if we do a `lxc delete <vm_instance>` afterward, it works as expected. 
tomponline commented 1 year ago

Thanks for the report, please can you look into how to fix this once you've finished your current roadmap item?

gabrielmougard commented 1 year ago

sure

gabrielmougard commented 1 year ago

I'm closing this for now as the described scenario seems to now work as expected.