canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.72k stars 641 forks source link

Out of disk space on the host leads to an Unknown state of the multipass guest #3379

Closed djshaw closed 8 months ago

djshaw commented 8 months ago

Describe the bug When the host filesystem runs out of disk space, and a guest tries to write to disk, the multipass instance appears to hang. It drops off the network, it's not possible to connect to it with multipass shell or multipass exec.

The issue can be resolved by making space on the host's filesystem, then rebooting.

To Reproduce Have a multipass guest make a lot of disk writes so that it attempts to write more to disk than the host has.

Expected behavior I expect the multipass instance to not be in the Unknown state, but instead to continue to run. I would expect the process in the guest to receive an out of disk space error.

Logs

Jan 18 15:14:44 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:14:45 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:05 walmart multipassd[8688]: Cannot open ssh session on "culsu" shutdown: ssh connection failed: 'No route to host'
Jan 18 15:15:05 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:06 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:06 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:07 walmart multipassd[8688]: Operation completed with error: (400) The instance cannot be cleanly shutdown as in Error status
Jan 18 15:15:09 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:09 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:09 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:12 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:12 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:13 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:15 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:15 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'
Jan 18 15:15:16 walmart multipassd[8688]: Executing 'ip -brief -family inet address show scope global'

Additional info

No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
djshaw@walmart:~$ multipass version
multipass   1.13.0
multipassd  1.13.0

(I'm no have a disk space issue, so multipass info doesn't seem relevant)

lxd

Additional context I haven't been able to find any bug reports that link a Unkown state condition to being out of disk space. My apologies if this is a duplicate report.

luis4a0 commented 8 months ago

Hi @djshaw! Unfortunately, running out of space is a host limitation. The instance thinks it has a real disk of a certain size and it tries to use it. From there comes the failure. We can see as a precondition the availability of space. On the other hand, It would be expensive for us to continually check the available space in order to signal the instances of an out-of-space condition. It isn't a good idea neither to add a warning if there is no space at creation time. Thus, I don't see a solution to this form our side, unfortunately. Thanks for your report!

djshaw commented 7 months ago

Thanks for the quick response @luis4a0, I appreciate the explanation. I was hoping (naievely--because I have no knowledge of the implementation) that it would suffice for the guest to receive a 0 as the return value of an fwrite(). I stand corrected.

Hopefully this github will be discovered by those who need to see it.

luis4a0 commented 7 months ago

Hi @djshaw, I understand your proposal, but the problem is at the virtualization level. Multipass works on top of the virtualization layer, so we can't do something like that. On the other hand, detecting when the disk is close to being full and giving a warning to the user or refusing to start the instance is something we can think about. Thanks!