NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.04k forks source link

`nixos-rebuild build-vm-with-bootloader` size + hang regression #240086

Open bjornfor opened 1 year ago

bjornfor commented 1 year ago

Describe the bug

nixos-rebuild build-vm-with-bootloader regressed in nixos-23.05 (compared to nixos-22.11) in that it now creates a disk image containing the full system closure, whereas before it used the host Nix store (at least for /nix/store, there was a small separate disk image for the bootloader).

The result is that big system closures now requires equally big /tmp for building the VM. And if your closure size is >= 64 GiB, then the build hangs and cannot even be built: https://github.com/lkl/linux/issues/466

Steps To Reproduce

Run nix-build ./the-file-below.nix in a nixpkgs tree checked out at nixos-23.05 (e.g. bb8b5735d6f7e06b9ddd27de115b0600c1ffbdb4):

# Reproducer for NixOS VM using *a lot* more space at build time in 23.05 than
# 22.11. Also observe hanging when the closure size is >= 64 GiB, due to
# https://github.com/lkl/linux/issues/466.
#
# The issue appears when nixpkgs decides to build a disk image for the VM,
# containing everyhing. In nixos-22.11 it didn't need to do that, instead it
# used everything from the host Nix store, except a small boot disk for the
# bootloader.
#
# If it starts building "nixos-boot-disk", thats good/fine, but building
# "nixos-disk-image" is bad.

{ nixpkgs ? ./. }:

let
  nixosFunc = import (nixpkgs + "/nixos/default.nix");

  mkClosureSizeGiB =  sizeGiB: id: pkgs:
    pkgs.runCommand "closure-size-${toString sizeGiB}-GiB-id-${toString id}"
    { nativeBuildInputs = with pkgs; [ util-linux ]; }
    ''
      mkdir -p "$out"
      fallocate -l "${toString sizeGiB}"GiB "$out/bigfile"
    '';

  # Closures above 64 GiB hang: https://github.com/lkl/linux/issues/466
  vmClosureSizeGiB = 66;

  configuration = { config, lib, pkgs, ... }:
  {
    environment.systemPackages =
      builtins.map (id: mkClosureSizeGiB 1 id pkgs) (lib.range 1 vmClosureSizeGiB);
  };
in
  (nixosFunc { inherit configuration; }).vmWithBootLoader

Beware that it creates 66 GiB of files in the Nix store, and then 66 GiB extra in /tmp at build time.

Expected behavior

I expect the vm-with-bootloader configuration to use the host Nix store for everything but the bootloader partition (which is typically much smaller than the closure size), like it did in NixOS 22.11.

Additional context

I ran git bisect and found:

  There are only 'skip'ped commits left to test.
  The first bad commit could be any of:
  58f4c3944db804bd28d35ceb4687961683052a91
  76c7b656bfa9b20a4172f7901285560db4c2c695
  e3a41f3fec8ddfc9e20df2e10f49c464525defa3
  614b83a3285ca44650473e73f9777d7c41fe88a1
  We cannot bisect more!

Of them, I think 76c7b656bfa9b20a4172f7901285560db4c2c695 ("nixos/qemu-vm: refactor bootDisk generation using make-disk-image") looks the most relevant. CC @RaitoBezarius.

Additional observations:

Notify maintainers

(No maintainers found for nixos/modules/virtualisation/qemu-vm.nix.)

RaitoBezarius commented 1 year ago

You're definitely right about this, it's an annoying one, an hope for this is https://github.com/NixOS/nixpkgs/pull/227883 and enable it for build-vm-with-bootloader. But there is a correctness issue into testing by reusing the host Nix store rather than building your own one and LKL is definitely also a second problem in the whole thing.

Anyway, this is on my radar, but it will probably take some time to get it properly sorted out for all cases.

bjornfor commented 1 year ago

[...] and LKL is definitely also a second problem in the whole thing.

At least for ext4 we can use mkfs.ext4 -d ./root-directory [...] instead of lkl/cptofs: https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/make-ext4-fs.nix.

(I guess ideally there would be only one "make disk image/fs" implementation, but apparently we already have multiple ones in nixos/lib/make-*.nix.)

RaitoBezarius commented 1 year ago

[...] and LKL is definitely also a second problem in the whole thing.

At least for ext4 we can use mkfs.ext4 -d ./root-directory [...] instead of lkl/cptofs: master/nixos/lib/make-ext4-fs.nix.

(I guess ideally there would be only one "make disk image/fs" implementation, but apparently we already have multiple ones in nixos/lib/make-*.nix.)

For disk images for VM, we are only using make-disk-image.nix, we could special case for each fsType a specific way to build partitions, but keep in mind that we need at least a way to build FAT32 and ext4 offline.

manuth commented 6 months ago

Whenever I try to use build-vm-with-bootloader, with nothing but KDE Plasma 5 or KDE Plasma 6 enabled, I get an error saying cptofs failed. diskSize might be too small for closure. despite the fact that make-disk-image is invoked with diskSize set to auto: https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/nixos/modules/virtualisation/qemu-vm.nix#L296

This could be reproduced using my config: https://git.nuth.ch/manuth/NixOSConfig/src/commit/889bdbe6979ff629f6d4185a521fda09d8308477 (commit 889bdbe6979ff629f6d4185a521fda09d8308477)

Take note, that nixos-rebuild build-vm works while nixos-rebuild build-vm-with-bootloader does not.

Is that, by chance, related to this issue?

Edit: I can, btw, confirm, that nixos-22.11 works properly.