Open pitkling opened 7 months ago
I'm guessing what is happening is that your Linux builder ran out of disk space. If you run nix-collect-garbage
on the builder that should free up space. You can also deploy a Linux builder that has more disk space, although I'm not sure if nix-darwin
makes that configurable or not, but the macos-builder.nix
module has an option for configuring that:
I'm guessing what is happening is that your Linux builder ran out of disk space. [..]
Ah, should have mentioned: That's what I thought first, but I checked during the build by logging into the linux-builder and running df -h
several times during the build. This is the output with the highest disk usage, immediately before the error:
[builder@nixos:~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 149M 0 149M 0% /dev
tmpfs 1.5G 0 1.5G 0% /dev/shm
tmpfs 742M 6.4M 736M 1% /run
tmpfs 1.5G 2.5M 1.5G 1% /run/wrappers
/dev/disk/by-label/nixos 20G 5.7G 13G 31% /
certs 461G 438G 24G 95% /etc/ssl/certs
/dev/disk/by-label/nix-store 1.3G 917M 247M 79% /nix/.ro-store
shared 461G 438G 24G 95% /tmp/shared
xchg 461G 438G 24G 95% /tmp/xchg
keys 461G 438G 24G 95% /var/keys
overlay 20G 5.7G 13G 31% /nix/store
tmpfs 297M 4.0K 297M 1% /run/user/1000
So it seems there should be plenty of space left.
Looks like nixos/lib/make-disk-image.nix
makes an estimate that's too small when producing the image for the nix store (useNixStoreImage
, nixos/modules/virtualisation/qemu-vm.nix
).
Thanks @roberth for the hint to nixes/lib/make-disk-image.nix
. Unfortunately, after some investigating it seems that's not the case (if I'm not overlooking something…).
To check whether it's a too small estimate of the disk image, I tried to keep the build product in order to inspect it. However, --keep-failed
seems not to work with remote builders. Instead, I adapted the flake slightly by exchanging the line virtualisation.host.pkgs = nixpkgs.legacyPackages."${system}-darwin";
for virtualisation.host.pkgs = nixpkgs.legacyPackages."${system}-linux";
and copied the flake via ssh onto my remote builder. I then build a vm image directly on the builder via nix build ./#linux-builder.config.system.build.vm --keep-failed
.
This fails with the same error message (ERROR: cptofs failed. diskSize might be too small for closure.
). The temporary build directory contains the following:
[builder@nixos:/tmp/nix-build-nix-store-image.drv-0]$ ls -lh
total 2.1G
-rw-r--r-- 1 nixbld1 nixbld 4.1K Mar 3 10:53 env-vars
-rw-r--r-- 1 nixbld1 nixbld 3.7G Mar 3 10:57 nixos.raw
drwxr-xr-x 4 nixbld1 nixbld 4.0K Mar 3 10:53 root
drwxr-xr-x 6 nixbld1 nixbld 4.0K Mar 3 10:53 state
If I understand the code in nixos/lib/make-disk-image.nix
correctly, the failing call to cptofs
tries to copy the content of root/nix/store
onto the disk image nixos.raw
, which at that moment should be empty and have size roughly 3.5 GB. Checking the size of root/nix/store
via du -hs
gives
[builder@nixos:/tmp/nix-build-nix-store-image.drv-0]$ du -hs root/nix/store
2.0G root/nix/store
So it should easily fit. Just to be sure I mounted the raw image nixos.raw
and checked its free capacity via df -h
:
[builder@nixos:/tmp/nix-build-nix-store-image.drv-0]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 149M 0 149M 0% /dev
tmpfs 1.5G 0 1.5G 0% /dev/shm
tmpfs 742M 6.4M 736M 1% /run
tmpfs 1.5G 2.5M 1.5G 1% /run/wrappers
/dev/disk/by-label/nixos 20G 7.8G 11G 42% /
certs 461G 436G 25G 95% /etc/ssl/certs
/dev/disk/by-label/nix-store 1.3G 917M 247M 79% /nix/.ro-store
shared 461G 436G 25G 95% /tmp/shared
xchg 461G 436G 25G 95% /tmp/xchg
keys 461G 436G 25G 95% /var/keys
overlay 20G 7.8G 11G 42% /nix/store
tmpfs 297M 4.0K 297M 1% /run/user/1000
/dev/loop0 3.5G 2.0G 1.4G 59% /home/builder/mnt-tmp
The last entry is for the nixos.raw
image. So at the moment the build fails, the image has still more than 40% free space. Also, again this shows that the builder's disk image (5th entry, for /dev/disk/by-label/nixos
) itself also still has plenty of capacity left.
Am I overlooking something? What else could cause the failure? Not sure whether it helps, but here's also the output of tune2fs -l
for the nixos.raw
image:
[builder@nixos:/tmp/nix-build-nix-store-image.drv-0]$ tune2fs -l nixos.raw
tune2fs 1.47.0 (5-Feb-2023)
Filesystem volume name: nix-store
Last mounted on: /mnt/0000fe00
Filesystem UUID: 287cfdab-a4fe-48fb-9e77-d9dd303c6c37
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: unsigned_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 238560
Block count: 952576
Reserved block count: 47628
Overhead clusters: 35090
Free blocks: 412136
Free inodes: 0
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 465
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 7952
Inode blocks per group: 497
Flex block group size: 16
Filesystem created: Sun Mar 3 10:53:31 2024
Last mount time: Sun Mar 3 10:57:11 2024
Last write time: Sun Mar 3 10:57:11 2024
Mount count: 2
Maximum mount count: -1
Last checked: Sun Mar 3 10:53:31 2024
Check interval: 0 (<none>)
Lifetime writes: 2465 MB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 4b5b9340-0387-45d4-a359-1ee21fc7afb1
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0x716951f8
After another look at the the logs above, the problem is not that nixos/lib/make-disk-image.nix underestimates the capacity but does not make sure that enough inodes are available (the output of tune2fs -l
in my last comment shows that the raw image has no free inodes left).
I tested this in a local branch by adding -N $numInodes
to the mkfs call ($numInodes
is already computed by make-disk-image.nix
). With this, the flake succeeds.
I couldn't find whose maintaining make-disk-image.nix
, but @samueldr added the inodes computation a few years back, so maybe he knows whether it is a good idea to explicitly add the inodes number to the mkfs
call or whether there is a better way to handle this?
Looks like the inode computation was only added for the purpose of reserving space in the block device. Not reserving them in the file system seems like an oversight, not something intentional.
I'm inclined to just go ahead with -N
.
Maybe add some margin? The image may be written to later in some usages of that function, e.g. block level copy on write for a VM.
Sorry for the late reply, was busy with work. Anyway, thanks for your quick assessment, @roberth. I will prepare a corresponding pull request, incorporating some margin. The space computation also has some margin that corresponds to roughly 5% of the calculated disk usage (which includes the storage reserved for the inodes). So for consistency I'll probably just take the same margin for the number of inodes.
I have a solution that might not work well for all, but it has been working well for me. Hopefully this PR will get merged soon enough to fix it properly.
I have been using the following:
system = {
stateVersion = "23.11";
build.qcow2 = import "${toString modulesPath}/../lib/make-disk-image.nix" {
inherit lib config pkgs;
diskSize = "auto";
additionalSpace = "20G";
fsType = "ext4";
format = "qcow2";
partitionTableType = "hybrid";
};
};
Setting the additionalSpace to 20G seems to make it happy, and now the only time I'm getting failures is when my ssd is full. The "downside" to this approach is that the disk will be allowed to grow up to 20GB bigger than it needs to be. On the other hand, I end up using nix shell a lot, so having the extra breathing room for /nix isn't a bad thing, IMO.
@bamhm182 It's a good workaround, especially since the disk image grows only if necessary. 🙂
I'm using PR #295874 for quite some time now and it works fine for me. Still, as I mentioned in the comments of the PR, I'm not sure whether I should add checks for other filesystems than ext4 (not sure which other filesystems support setting inodes) and whether the inode number should take the default of ext4 into account (seems sensible if images are not read-only).
I'm happy to improve the PR if someone more experienced with how make-disk-image.nix
is used finds the time to answer my questions in the PR's comments.
Steps To Reproduce
Steps to reproduce the behavior:
Build the following flake via
nix build .#linux-builder.config.system.build.macos-builder-installer
cptofs
when buidingnix-store-image.drv
, with the errorERROR: cptofs failed. diskSize might be too small for closure.
.Build log
See this gist for the log.
Additional context
I put together the above flake as a minimal failing example. Note that most of the flake inputs are not even used, but when commenting one of them out, the flake builds again. In intermediate steps I had even more strange behavior. For example, the actual flake where I first had this issue is in a git repository. If I delete the
.git
directory, it builds.Currently I would already be happy to know whether this is reproducible by other people or whether it is a problem on my side?
Notify maintainers
@roberth, @Gabriella439 Since this seemingly happens inside the macosx-builder. But not sure whether this is the actual culprit…
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.Add a :+1: reaction to issues you find important.