VMs fail to start because adjusting filesystem size times out

SuzanneSoy commented 5 years ago

Qubes OS version:

4.0

Affected component(s):

VMs based on Fedora-26, likely debian-9 too.

Steps to reproduce the behavior:

Some VMs fail to start: Cannot connect to qrexec agent for 60 seconds, see /var/log/xen/console/guest-xxx.log for details.

In that log file, I can see that the problematic VMs are busy with the following task (the time updates until around 55s when the VM is killed):

A start job is running for Adjust r…t filesystem size (45s / no limit)

Expected behavior:

The VMs should start, even if that takes 10 minutes. Beyond 60 seconds it is reasonable to show a message and invite the user to check the logs to see what's going on.

Actual behavior:

The VM gets killed, preventing access to its data and applications.

General notes:

After seeing this commit https://github.com/marmarek/qubes-core-agent-linux/commit/7b4d453a23b066e5919f5498c5b3c75f8df2bae1 I looked for scripts named *resize* in /usr/lib, found one (resize-rootfs-if-needed.sh I think, but there are a couple), and disabled it (added exit 0 at the top). This allowed me to boot the VMs. I grepped through QubesOS's code but couldn't find that 60-second timeout is defined, 60 is a fairly common number it seems :) .

Related issues:

Related to the fix at https://github.com/marmarek/qubes-core-agent-linux/commit/7b4d453a23b066e5919f5498c5b3c75f8df2bae1 for #3173 and #3143.

marmarek commented 5 years ago

Resizing filesystem normally takes seconds, do you run the system from USB or other slow disk?

This allowed me to boot the VMs. I grepped through QubesOS's code but couldn't find that 60-second timeout is defined, 60 is a fairly common number it seems :)

It's controlled from dom0: qvm-prefs VNMAME qrexec_timeout [new-value]

yonjah commented 5 years ago

I'm getting the same issue. Normal VM running from local install on ssd. I've recently installed updates and I think VM failed to start after that (I've only noticed a single VM that failed to start).

I tried cloning the affected VM and change it's underlying template but in both cases VM failed to start

SuzanneSoy commented 5 years ago

@marmarek The VM booted fine in about 3 minutes after I increased the timeout. I have a slightly older SSD and CPU. I do tend to run VMs with small amounts of RAM allocated.

I had tried cloning the problematic VM like @yonjah, without success.

I noticed that now that I increased the timeout, the VM does boot fine, but it takes about 3 minutes to resize that root partition every time it is booted.

marmarek commented 5 years ago

Changes in root filesystem on template-based VMs are not persistent. Start the template - it should resize root filesystem on boot and change will persist there.

yonjah commented 5 years ago

@marmarek starting the template seem to have no effect on the appVM boot time. Also template start with no noticeable delay.

After increasing the timeout I checked the vm journal for the boot log and it seem like the delay is happening between mounting the temporary directory and init the crng

Oct 29 20:42:50 localhost systemd[1]: Mounting Temporary Directory...
Oct 29 20:42:50 localhost systemd[1]: Mounted Temporary Directory.
Oct 29 20:44:54 localhost kernel: random: crng init done
Oct 29 20:44:54 localhost kernel: random: 7 urandom warning(s) missed due to ratelimiting

marmarek commented 5 years ago

crng init happen in parallel (asynchronously), it doesn't block anything at this stage. What is the next thing in the log? systemd-analyze may be useful to find what's wrong

As for the template - check (df -h /) if the filesystem is really resized there - maybe the problem is it isn't happening at all? Check also sudo systemctl status qubes-rootfs-resize.service.

yonjah commented 5 years ago

I'm attaching the output of the commands. But I'm not sure how to track this issue farther

systemd-analyze blame

    1min 41.077s qubes-rootfs-resize.service
          4.287s qubes-misc-post.service
          3.591s dev-xvda3.device
          3.034s qubes-update-check.service
          1.042s qubes-firewall.service

df -h /

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda3      9.6G  4.2G  5.0G  46% /

sudo systemctl status qubes-rootfs-resize.service

● qubes-rootfs-resize.service - Adjust root filesystem size
   Loaded: loaded (/usr/lib/systemd/system/qubes-rootfs-resize.service; enabled; vendor preset: enabled)
   Active: active (exited) since Mon 2018-10-29 21:09:03 AWST; 45min ago
 Main PID: 269 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/qubes-rootfs-resize.service

Oct 29 21:07:22 localhost systemd[1]: Starting Adjust root filesystem size...
Oct 29 21:09:03 localhost resize-rootfs-if-needed.sh[269]: Warning: Partition /dev/xvda3 is being used. Are you sure you want to continue?
Oct 29 21:09:03 localhost resize-rootfs-if-needed.sh[269]: [168B blob data]
Oct 29 21:09:03 localhost resize-rootfs-if-needed.sh[269]: Information: You may need to update /etc/fstab.
Oct 29 21:09:03 localhost resize-rootfs-if-needed.sh[269]: [106B blob data]
Oct 29 21:09:03 localhost resize-rootfs-if-needed.sh[269]: The filesystem is already 2569467 (4k) blocks long.  Nothing to do!
Oct 29 21:09:03 localhost systemd[1]: Started Adjust root filesystem size.

marmarek commented 5 years ago

Ok, looks like resize-rootfs script (called from resize-rootfs-if-needed.sh) waits for something. Can you try to run it manually, with more detailed logging, like this:

sudo sh -x /usr/lib/qubes/resize-rootfs

? And observe where the wait happen.

yonjah commented 5 years ago

When I run it manually the delay doesn't hapen. I changed the startup script in the template to get the commands and this is the output -

Oct 30 19:20:28 localhost systemd[1]: Starting Adjust root filesystem size...
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: ++ blockdev --getro /dev/xvda
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + '[' 0 -eq 1 ']'
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + sysfs_xvda=/sys/class/block/xvda
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + non_rootfs_data=512000
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: ++ tail -n 1
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: ++ df --output=size /
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + rootfs_size=' 10050748'
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + rootfs_size=20101496
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: ++ cat /sys/class/block/xvda/size
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + '[' 20971520 -lt 20613496 ']'
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + /usr/bin/sh -x /usr/lib/qubes/resize-rootfs
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + set -e
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: +++ cut -f 1 -d ' '
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: +++ grep device-mapper /proc/devices
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: ++ printf %x 253
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + dm_major=fd
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + case "$(stat -Lc %t:%T /dev/mapper/dmroot)" in
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: ++ stat -Lc %t:%T /dev/mapper/dmroot
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + parted ---pretend-input-tty /dev/xvda print
Oct 30 19:20:28 localhost resize-rootfs-if-needed.sh[270]: + echo fix
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: + parted ---pretend-input-tty /dev/xvda resizepart 3
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: + echo yes 100%
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: Warning: Partition /dev/xvda3 is being used. Are you sure you want to continue?
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: [168B blob data]
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: Information: You may need to update /etc/fstab.
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: [97B blob data]
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: + resize2fs /dev/mapper/dmroot
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: resize2fs 1.43.4 (31-Jan-2017)
Oct 30 19:22:01 localhost resize-rootfs-if-needed.sh[270]: The filesystem is already 2569467 (4k) blocks long.  Nothing to do!
Oct 30 19:22:01 localhost systemd[1]: Started Adjust root filesystem size.

It seem like the delay is happening on the echo fix | parted ---pretend-input-tty /dev/xvda print >/dev/null

ca:3)
        # resize partition table itself
        # use undocumented ---pretend-input-tty (yes, three '-') to
        # force unattended operation, otherwise it aborts on first
        # prompt, even with '-s' option
        echo fix | parted ---pretend-input-tty /dev/xvda print >/dev/null
        # then resize 3rd partition, even though it is mounted
        echo yes 100% | parted ---pretend-input-tty /dev/xvda resizepart 3
        # and reload partition table; prefer partprobe over blockdev
        # --rereadpt, as it works on mounted partitions
        partprobe /dev/xvda
        ;;

I can probably remove the output redirect and try and see if there is any meaningful output

yonjah commented 5 years ago

So I displayed the input and it just prints the partition table after the delay.

Oct 30 19:46:21 localhost resize-rootfs-if-needed.sh[268]: + parted ---pretend-input-tty /dev/xvda print
Oct 30 19:46:21 localhost resize-rootfs-if-needed.sh[268]: + echo fix
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]: Model: Xen Virtual Block Device (xvd)
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]: Disk /dev/xvda: 10.7GB
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]: Sector size (logical/physical): 512B/4096B
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]: Partition Table: gpt
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]: Disk Flags:
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]: Number  Start   End     Size    File system  Name                 Flags
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]:  1      1049kB  211MB   210MB                EFI System           boot, esp
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]:  2      211MB   213MB   2097kB               BIOS boot partition  bios_grub
Oct 30 19:47:49 localhost resize-rootfs-if-needed.sh[268]:  3      213MB   10.7GB  10.5GB  ext4         Root filesystem

I assume fixing the GPT is what causing the delay but I couldn't get any more info on what is going on or why GPT is broken in the first place

SuzanneSoy commented 5 years ago

@yonjah A GPT contains a main partition table at the beginning of the disk, and a backup one at the end of the disk. If the disk is expanded, then backup will not be at the end of the disk, so this most likely is what needs "fixing".

Would we have better luck with sgdisk, which is designed for non-interactive use and is able to edit GPT partition tables?

SuzanneSoy commented 5 years ago

@marmarek I also see that it continues after parted as soon as the crng init is done.

GPT labels contain some GUIDs that are randomly generated. I noticed recently while working with gdisk that when the GPT mirror is misplaced and needs to be recreated, or for some partition operations, it generates a fresh GUID for the disk (or for the partition). To generate the fresh GUID it quite likely needs a source of randomness, which could explain why it blocks until crng is available?

marmarek commented 5 years ago

To generate the fresh GUID it quite likely needs a source of randomness, which could explain why it blocks until crng is available?

That's plausible explanation. Maybe it's possible to convince the tool to not do that? Since we don't really add new partitions, it should be ok to not change any of those uuids. Using sgdisk may indeed be a better idea.

SuzanneSoy commented 5 years ago

I have been piping commands into gdisk (before I realised the non-interactive version sgdisk existed), and I had no control over the generation of GUIDs. The only thing I was able to do to get a deterministic, reproducible disk image was to let it generate random GUIDs, then set them to a constant afterwards (it has a command for that). That won't prevent it from trying to get some random bytes, though. I haven't checked but I expect parted to behave similarly :-( .

Why is the crng init taking so long? And could we put a dependency between the jobs, so that the resizing job happens after the crng job?

marmarek commented 5 years ago

There are very few randomness sources VM have access to. You can read more about it in https://github.com/QubesOS/qubes-issues/issues/673 You can check ordering of systemd-random-seed.service, haveged.service and this one. Having systemd-random-seed.service started before this one should help (but not sure if solve it fully).

yonjah commented 5 years ago

@marmarek @jsmaniac I kinda understand what GPT is. But I was wondering why it is broken in the first place (which I assumed fixing it is what causing the delay). and if there is no way for the fix to persist so it wont have to happen on every restart.

I understand it might actually be parted just waiting for the PRNG to load. but then why is it only happening on this vm and not on others ?

imestin commented 5 years ago

(I want to follow this but I don't know how to do that in Github so I will just write something here)

marmarek commented 5 years ago

(I want to follow this but I don't know how to do that in Github so I will just write something here)

There is "subscribe" button on the right.

SuzanneSoy commented 5 years ago

I managed to consistently reproduce the issue:

qvm-create --class AppVM --template fedora-26 --label red foobar
# This boots fast.
time qvm-run foobar 'touch somefile'
qvm-run foobar 'sudo halt -p' || true
# Wait for the VM to halt.
while qvm-check --running foobar; do sleep 1; done
sleep 5
qvm-prefs --set foobar maxmem 400
qvm-prefs --set foobar memory 400
# This will take a long time to boot
time qvm-run foobar true

When the amount of RAM allocated to the VM is reduced below a certain point and there is some data on the VM's private volume, it blocks on the resizing step which takes several dozens of seconds.

With an empty VM like this, the second boot is 4x slower than the first. The boot time can go up to 2-3 minutes on other VMs.

If I re-increase maxmem to 4000, then the VM boots fast again.

I noticed that when booting such a VM with lots of RAM available, it uses around 1400M of RAM at the end of the boot process, which is way to much to start a simple terminal.

marmarek commented 5 years ago

While the actual issue (parted waiting for crng init, even though no new uuid should be needed) is not fixed yet, it's impact should be greatly reduced now because of https://github.com/QubesOS/qubes-issues/issues/4553 (do not call root fs resize at all, if not needed).

SuzanneSoy commented 5 years ago

It seems that the issue mostly happens with VMs based on fedora-26, and even when starting the template itself (it takes forever to resize, and now does it every time, even if the timeout is increased and it boots successfully several times in a row). VMs based on debian-9 templates don't seem to have the issue on my machine (it might have happened with a previous installed, but I might not remember correctly). I'll see if the issue arises with fedora-29.

PS: @marmarek thanks for the answer on HVM security on the mailing list the other day ;-)

marmarek commented 5 years ago

It seems that the issue mostly happens with VMs based on fedora-26

This makes sense as there are not updates for fedora-26 anymore, and that include a fix for #4553.

SuzanneSoy commented 5 years ago

@marmarek Yep, I saw the announcement for fedora-29 on the mailing list the other day and promptly installed it. However, I was not aware of new versions of fedora templates before that, since qubes-dom0-update does not install them automatically.

I would have expected a metapackage to bring in new official templates when qubes-dom0-update is executed. It seems that the need to manually install new TemplateVMs is not indicated in https://www.qubes-os.org/doc/software-update-dom0/ nor in https://www.qubes-os.org/doc/software-update-vm/ (I might have missed it). Is this usability issue tracked somewhere or should I open a ticket?

yonjah commented 5 years ago

I can also confirm that upgrading to fedora-29 solved this issue

0spinboson commented 5 years ago

@marmarek Yep, I saw the announcement for fedora-29 on the mailing list the other day and promptly installed it. However, I was not aware of new versions of fedora templates before that, since qubes-dom0-update does not install them automatically.

I would have expected a metapackage to bring in new official templates when qubes-dom0-update is executed. It seems that the need to manually install new TemplateVMs is not indicated in https://www.qubes-os.org/doc/software-update-dom0/ nor in https://www.qubes-os.org/doc/software-update-vm/ (I might have missed it). Is this usability issue tracked somewhere or should I open a ticket?

Such behavior would lead to the removal of any programs/packages you install yourself, so it wouldn't be very convenient default behavior. Similarly, installing every fedora distro released in parallel would also be inconvenient because it would take up quite a bit of space for people who don't wish to change their templates every 6-8 months. (Though I understand that fedora is moving away from this model starting with 30/31?)

marmarek commented 5 years ago

As for the original issue, it seems sfdisk in fedora-28 do support GPT too and is able to resize partition without loading any random data. The command is:

echo ',+' | sfdisk -N 3 /dev/xvda

It handle both adjusting backup GPT place and resizing in one go.

marmarek commented 5 years ago

sfdisk in Debian stretch (util-linux 2.29.2) also supports it, but is buggy: https://github.com/karelzak/util-linux/issues/532 This is fixed in util-linux 2.32.1, which is only in buster+

qubesos-bot commented 5 years ago

Automated announcement from builder-github

The package core-agent-linux has been pushed to the r4.0 testing repository for the CentOS centos7 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.0-current-testing

Changes included in this update

qubesos-bot commented 5 years ago

Automated announcement from builder-github

The package qubes-core-agent_4.0.43-1 has been pushed to the r4.1 testing repository for the Debian template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot commented 5 years ago

Automated announcement from builder-github

The package qubes-core-agent_4.0.43-1 has been pushed to the r4.0 testing repository for the Debian template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot commented 5 years ago

Automated announcement from builder-github

The package qubes-core-agent_4.0.43-1+deb9u1 has been pushed to the r4.0 stable repository for the Debian template. To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot commented 5 years ago

Automated announcement from builder-github

The package core-agent-linux has been pushed to the r4.0 stable repository for the CentOS centos7 template. To install this update, please use the standard update command:

sudo yum update

Changes included in this update

mig5 commented 6 months ago

Hit this today on Qubes 4.2 with a standalone VM based on fedora-38-xfce :/

In the end I adjusted the timeout to 300, saw it gave up on the resize job after 2 minutes got into the VM and ran resize2fs /dev/mapper/dmroot - which was instant and correctly resized the root disk!

Strange.

roguh commented 6 months ago

I'm also seeing this issue. I think it'd be useful to have the qrexec timeout be in a GUI or for the preferences command to be suggested as part of the error message users currently get.

BawdyAnarchist commented 4 months ago

I am also having this problem, but only on Standalone VMs. Creating a new Standalone boots fine, but changing "System storage max size" causes the timeout of qubes-rootfs-resize.service. If I mask the service via qvm-prefs, the Standalone starts again. After running /usr/lib/qubes/resize-rootfs in the VM, resize is successful, and the Standalone boots fine without the service mask.

This is on a freshly installed and updated Qubes 4.2.

kennethrrosen commented 2 months ago

I am also having this problem, but only on Standalone VMs. Creating a new Standalone boots fine, but changing "System storage max size" causes the timeout of qubes-rootfs-resize.service. If I mask the service via qvm-prefs, the Standalone starts again. After running /usr/lib/qubes/resize-rootfs in the VM, resize is successful, and the Standalone boots fine without the service mask.

This is on a freshly installed and updated Qubes 4.2.

Ditto for the standalone issue via resize and efi.mount fails

100111001 commented 2 months ago

Resizing filesystem normally takes seconds, do you run the system from USB or other slow disk?

This allowed me to boot the VMs. I grepped through QubesOS's code but couldn't find that 60-second timeout is defined, 60 is a fairly common number it seems :)

It's controlled from dom0: qvm-prefs VNMAME qrexec_timeout [new-value]

I just encountered this error message with a non-bootable AppVm based on a debian 12 template. Using the default ssd on a X1 Carbon Gen 10.

It happened after I increased the root file system size in the template but didn't restarted it. Starting the AppVm based on this template resulted in a qrexec timeout. I could resolve the symptom by increasing the timeout of the AppVm. To resolve the root cause I restarted the template.

kennethrrosen commented 2 months ago

When attempting to run qubes/resize-rootfs, I get the following error as both user and root:

Filesystem at /dev/mapper/dmroot is mounted on /; on-line resizing required
old_desc_blocks = 25, new_desc_blocks = 29
resize2fs: Permission denied to resize filesystem

QubesOS / qubes-issues