canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

When unpacking an image fails, image and instance aren't cleaned up #11646

Closed ru-fu closed 1 year ago

ru-fu commented 1 year ago

When launching an instance on a host VM with a small disk (4GiB), there isn't enough space to fully unpack the image. The process hangs at "Unpack: 100%". However, both instance and image are created and can be listed. When killing the qemu-img convert process, no cleanup happens, and the image and instance are still in the list.

Creating a VM on a host VM with too little disk space:

root@vm:~# lxc launch images:ubuntu/23.04/desktop --vm
Creating the instance
Retrieving image: Unpack: 100% (1.42GB/s)   

Log from lxc monitor (long time without output after):

EBUG  [2023-05-04T08:01:48Z] New operation                                 class=task description="Updating images" operation=16656177-22d8-48cb-a0e6-fb6e44247712 project=
INFO   [2023-05-04T08:01:48Z] ID: 16656177-22d8-48cb-a0e6-fb6e44247712, Class: task, Description: Updating images  CreatedAt="2023-05-04 08:01:48.937222092 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[]" Status=Pending StatusCode=Pending UpdatedAt="2023-05-04 08:01:48.937222092 +0000 UTC"
DEBUG  [2023-05-04T08:01:48Z] Acquiring image task lock                    
DEBUG  [2023-05-04T08:01:48Z] Acquired image task lock                     
INFO   [2023-05-04T08:01:48Z] Updating images                              
INFO   [2023-05-04T08:01:48Z] ID: 16656177-22d8-48cb-a0e6-fb6e44247712, Class: task, Description: Updating images  CreatedAt="2023-05-04 08:01:48.937222092 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[]" Status=Success StatusCode=Success UpdatedAt="2023-05-04 08:01:48.937222092 +0000 UTC"
INFO   [2023-05-04T08:01:48Z] Done updating images                         
DEBUG  [2023-05-04T08:01:48Z] Success for operation                         class=task description="Updating images" operation=16656177-22d8-48cb-a0e6-fb6e44247712 project=

The image is already listed, but it's still busy converting:

root@vm:~# lxc image list
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
| ALIAS | FINGERPRINT  | PUBLIC |             DESCRIPTION             | ARCHITECTURE |      TYPE       |   SIZE    |         UPLOAD DATE         |
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
|       | 215974d90318 | no     | Ubuntu lunar amd64 (20230503_07:42) | x86_64       | VIRTUAL-MACHINE | 1117.81MB | May 4, 2023 at 7:18am (UTC) |
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
root@vm:~# ps aux | grep lxd
root         260  0.0  1.9 723864 18784 ?        Ssl  06:52   0:02 /run/lxd_agent/lxd-agent
root        2225  0.0  0.0   2888   144 ?        Ss   07:01   0:00 /bin/sh /snap/lxd/24817/commands/daemon.start
root        2377  0.0  0.0 153160   140 ?        Sl   07:01   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root        2390  0.7  5.5 5615352 53860 ?       Sl   07:01   0:45 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd         3232  0.0  0.1  10080  1572 ?        Ss   07:01   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --no-negcache --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.107.53.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.107.53.2,10.107.53.254,1h --listen-address=fd42:d402:c6c1:c3ee::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd --interface-name _gateway.lxd,lxdbr0 -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
root        3277  0.0  1.5 1097944 14724 pts/0   Sl+  07:02   0:00 /snap/lxd/24817/bin/lxc launch images:ubuntu/23.04/desktop --vm
root        3316  0.0  1.4 1023700 14332 pts/1   Sl+  07:02   0:00 /snap/lxd/24817/bin/lxc monitor --pretty
root        3566  0.1  5.2 769308 51068 ?        SNsl 07:18   0:09 qemu-img convert -f qcow2 -O raw -T none -t none -W /var/snap/lxd/common/lxd/images/215974d90318b9cc725ca06d5f4a88914baab5c82799cad28ea5311c5bdd06e9.rootfs /dev/zd0
root       12386  0.0  0.2   9208  2332 pts/2    S+   08:48   0:00 grep --color=auto lxd

Killing the process does not trigger cleanup:

root@vm:~# kill -s 9 3566
root@vm:~# ps aux | grep lxd
root         260  0.0  1.8 723864 18156 ?        Ssl  06:52   0:02 /run/lxd_agent/lxd-agent
root        2225  0.0  0.0   2888   144 ?        Ss   07:01   0:00 /bin/sh /snap/lxd/24817/commands/daemon.start
root        2377  0.0  0.0 153160   140 ?        Sl   07:01   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root        2390  0.2  5.8 5623548 56576 ?       Sl   07:01   0:46 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd         3232  0.0  0.1  10080  1572 ?        Ss   07:01   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --no-negcache --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.107.53.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.107.53.2,10.107.53.254,1h --listen-address=fd42:d402:c6c1:c3ee::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd --interface-name _gateway.lxd,lxdbr0 -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
root        3277  0.0  1.5 1097944 14724 pts/0   Sl+  07:02   0:00 /snap/lxd/24817/bin/lxc launch images:ubuntu/23.04/desktop --vm
root        3316  0.0  1.4 1023700 14332 pts/1   Sl+  07:02   0:00 /snap/lxd/24817/bin/lxc monitor --pretty
root       12529  0.0  0.2   9208  2320 pts/2    S+   12:10   0:00 grep --color=auto lxd
root@vm:~# lxc ls
+------------+---------+------+------+-----------------+-----------+
|    NAME    |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+------------+---------+------+------+-----------------+-----------+
| rich-sloth | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+------------+---------+------+------+-----------------+-----------+
root@vm:~# lxc image ls
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
| ALIAS | FINGERPRINT  | PUBLIC |             DESCRIPTION             | ARCHITECTURE |      TYPE       |   SIZE    |         UPLOAD DATE         |
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
|       | 215974d90318 | no     | Ubuntu lunar amd64 (20230503_07:42) | x86_64       | VIRTUAL-MACHINE | 1117.81MB | May 4, 2023 at 7:18am (UTC) |
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
root@vm:~# lxc operation list
+--------------------------------------+------+-------------------+---------+------------+----------------------+
|                  ID                  | TYPE |    DESCRIPTION    | STATUS  | CANCELABLE |       CREATED        |
+--------------------------------------+------+-------------------+---------+------------+----------------------+
| d20c3aff-aff0-4e41-8592-32e703965c45 | TASK | Creating instance | RUNNING | NO         | 2023/05/04 07:02 UTC |
+--------------------------------------+------+-------------------+---------+------------+----------------------+
INFO   [2023-05-04T12:03:23Z] Done updating images                         
DEBUG  [2023-05-04T12:11:26Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2023-05-04T12:11:26Z] GetInstanceUsage started                      instance=rich-sloth project=default
DEBUG  [2023-05-04T12:11:26Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/instances?filter=&recursion=2" username=root
DEBUG  [2023-05-04T12:11:26Z] GetInstanceUsage finished                     instance=rich-sloth project=default
WARNING[2023-05-04T12:11:26Z] Error getting disk usage                      err="Failed to run: zfs get -H -p -o value used default/virtual-machines/rich-sloth.block: exit status 1 (cannot open 'default/virtual-machines/rich-sloth.block': pool I/O is currently suspended)" instance=rich-sloth instanceType=virtual-machine project=default
DEBUG  [2023-05-04T12:11:32Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2023-05-04T12:11:32Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/images?filter=&recursion=1" username=root
DEBUG  [2023-05-04T12:16:31Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2023-05-04T12:16:31Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/operations?recursion=1" username=root

(Kill at 12:11)

ru-fu commented 1 year ago

Follow-up bug: After increasing the disk size and rebooting the host machine & host VM, trying to launch another VM with the same image fails:

root@vm:~# lxc image list
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
| ALIAS | FINGERPRINT  | PUBLIC |             DESCRIPTION             | ARCHITECTURE |      TYPE       |   SIZE    |         UPLOAD DATE         |
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
|       | 215974d90318 | no     | Ubuntu lunar amd64 (20230503_07:42) | x86_64       | VIRTUAL-MACHINE | 1117.81MB | May 4, 2023 at 7:18am (UTC) |
+-------+--------------+--------+-------------------------------------+--------------+-----------------+-----------+-----------------------------+
root@vm:~# lxc launch images:ubuntu/23.04/desktop --vm
Creating the instance
Error: Failed instance creation: Failed creating instance from image: Failed to run: zfs clone -o volmode=none default/images/215974d90318b9cc725ca06d5f4a88914baab5c82799cad28ea5311c5bdd06e9.block@readonly default/virtual-machines/wondrous-ladybird.block: exit status 1 (cannot open 'default/images/215974d90318b9cc725ca06d5f4a88914baab5c82799cad28ea5311c5bdd06e9.block@readonly': dataset does not exist)
tomponline commented 1 year ago

I reproduced this issue, but I don't think there is much we can do about it. You've effectively emulated a defective hard disk.

My understanding is that you were launching a VM with a 4GiB root disk, and then installing LXD inside it and using a ZFS storage pool on a loop file inside the VM (which LXD sets at a minimum size of 5GiB) and then trying to create a large nested VM (using images:ubuntu/23.04/desktop) on that pool.

This would have caused the outer VM's root disk filesystem to fill up, which would have then presented as a disk I/O problem to the ZFS pool running on the loop file (because the loop file is of a fixed size, but that full size cannot be realised), causing the warning:

[  192.622010] WARNING: Pool 'default' has encountered an uncorrectable I/O failure and has been suspended.
/dev/sda2       3.8G  3.7G     0 100% /

Then the qemu-img process hangs, and even when its killed it then goes into a defunct state, presumably because of blocked I/O:

root        3366  0.0  1.1 1097708 11552 pts/0   Sl+  13:27   0:00 /snap/lxd/24871/bin/lxc launch images:ubuntu/23.04/desktop --vm
root        3567  1.2  0.0      0     0 ?        ZNsl 13:27   0:14 [qemu-img] <defunct>

I've looked at the go routine list in LXD when that happens, and found this:

goroutine 682 [syscall, 7 minutes]:
syscall.Syscall6(0x7fd94c3b9678?, 0x7fd975314108?, 0x10?, 0xc000374c00?, 0xc000bde620?, 0x0?, 0x686690?)
    /snap/go/current/src/syscall/syscall_linux.go:91 +0x36
os.(*Process).blockUntilWaitable(0xc000af3830)
    /snap/go/current/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000af3830)
    /snap/go/current/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
    /snap/go/current/src/os/exec.go:132
github.com/lxc/lxd/shared/subprocess.(*Process).start.func3()
    /build/lxd/parts/lxd/src/shared/subprocess/proc.go:176 +0x6c
created by github.com/lxc/lxd/shared/subprocess.(*Process).start
    /build/lxd/parts/lxd/src/shared/subprocess/proc.go:173 +0x6f8

This is Go waiting on the qemu-img process to finish, and because its in a defunct state it never will. Go is actually waiting on the kernel to return from waitid syscall (https://linux.die.net/man/3/waitid).

The problem is that by this point you have to stop the VM forcefully, crashing LXD, in order to grow the underlying filesystem, at which point the revert logic inside LXD that would have deleted the image storage volume record will not run as LXD is killed.

This then leaves an image storage volume record in the DB, but the actual ZFS volume doesn't exist, and hence on next init/launch you get the error:

Error: Failed instance creation: Failed creating instance from image: Failed to run: zfs clone -o volmode=none default/images/215974d90318b9cc725ca06d5f4a88914baab5c82799cad28ea5311c5bdd06e9.block@readonly default/virtual-machines/wondrous-ladybird.block: exit status 1 (cannot open 'default/images/215974d90318b9cc725ca06d5f4a88914baab5c82799cad28ea5311c5bdd06e9.block@readonly': dataset does not exist)
tomponline commented 1 year ago

In theory one way to avoid getting into this state is to avoid making the loop file bigger than the available free space on the root disk. So rather than setting a minimum size of 5GiB we could set it smaller or detect insufficient free space and refuse to create the pool.

However this does not guarantee that we won't be able to get into this state another time by having plenty of free space when the pool is created, and then filling the disk later such that the same problem occurs.

tomponline commented 1 year ago

@stgraber could we do something with loopFileSizeDefault so that it prevents pool creation using a loop file size that is larger than the free space at the time the pool is created. It wouldn't be a panacea but might avoid accidental situations like this.

https://github.com/lxc/lxd/blob/d4f88a91d32ca90eccbdb8bede539d06f56d4add/lxd/storage/drivers/utils.go#L797-L815

Currently its using filesystem size as a basis for calculation of pool loop file, whereas it could use filesystem free space as the basis. And then fail if there isn't at least 5GiB free.

stgraber commented 1 year ago

Yeah, we can make it use free space instead.