astro / microvm.nix

NixOS MicroVMs
https://astro.github.io/microvm.nix/
MIT License
1.24k stars 93 forks source link

Switch to truncate in createVolumesScript to avoid race conditions and support larger disk allocations #229

Closed cryptoluks closed 4 months ago

cryptoluks commented 4 months ago

Hello,

During the initial creation of new volumes, I am encountering issues on hosts that use either ext4 or btrfs with zstd compression. MicroVMs fully declarative.

The fallocate command creates the disk files, but the free space is only recognized after the disk is formatted. Due to what seems to be parallel file creation, a race condition occurs. As a result, only one or two volumes are formatted correctly. The others appear "corrupt" and cannot be used without manual intervention, which includes removing the affected volumes and restarting the VMs.

Each of my volumes is approximately 256GB, but the host has only about 500GB available. The issue arises because fallocate requires that the space it allocates be immediately available, which isn't feasible with my setup.

Workarounds:

  1. Increase Host Disk Size: Use a larger disk on the host.
  2. Reduce Volume Size: Opt for smaller volumes. However, I aim to future-proof my setup, which may risk some VMs exceeding their allocated space.
  3. Manual VM Start: Avoid auto-starting VMs on the first setup; instead, manually start them one by one to ensure disk creation is handled properly.
  4. Pre-create Volumes: Create all volumes in advance to avoid the issue.

In my testing fork, I switched to using the truncate command. This change seems to resolve the issue by not requiring the immediate availability of disk space.

I propose we switch to using truncate for disk creation. Are there any potential drawbacks to this approach that I might not be considering? Your feedback would be greatly appreciated!

astro commented 4 months ago

I am not fully understanding how this creates race condition. Your solution still seems valid if overprovisioning is the goal.

Solution 3 requires limiting concurrency in systemd jobs that are currently launched in parallel. There has been no practical solution for that but it would be very useful for the microvm@.service unit in general.