NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.15k stars 14.18k forks source link

When /boot is full, system rebuilds fail #23926

Open joepie91 opened 7 years ago

joepie91 commented 7 years ago

Issue description

When the /boot partition is entirely full (eg. when old generations have not been removed for a long time), any kind of nixos-rebuild command will fail if a new kernel is attempted to be installed.

Deleting old generations and garbage-collecting does not fix the issue, because garbage collection doesn't touch the /boot partition, and nixos-rebuild will only try to remove obsolete images after having placed the new initrd in /boot. Since it's full, the new image cannot be copied over:

cannot copy /nix/store/8i1ixqycplb4wc812wkxxf432424jxh5-initrd/initrd to /boot/kernels/8i1ixqycplb4wc812wkxxf432424jxh5-initrd-initrd.tmp
warning: error(s) occurred while switching to the new configuration

... which means the "remove old images" routine never occurs, and the user is stuck.

I've worked around this by manually moving a very old image out of the /boot partition into /root, then running nixos-rebuild boot, then moving back the moved image after cleanup and running nixos-rebuild boot again to ensure that it wasn't a necessary image after all.

Steps to reproduce

  1. Have a /boot with no space left.
  2. Try to rebuild the system with a new kernel build.

Technical details

abbradar commented 7 years ago

Not sure why can't we just swap the order (first clean old images, then place new ones) given that they are already removed from Nix store (so they won't boot anyway).

vcunat commented 7 years ago

They may be present and they may be even alive through some other GC root, but I can't see a good reason either.

cleverca22 commented 7 years ago

also of note to prevent future issues, there are options like boot.loader.grub.configurationLimit to limit how many generations actually get copied to /boot

equalunique commented 7 years ago

The OP workaround doesn't solve the issue for someone with a gummiboot UEFI install. Running nixos-rebuild boot fails because the command simply fills up /boot all over again. :(

nagisa commented 7 years ago

Still a problem.

jpotier commented 7 years ago

Still a problem.

abbradar commented 7 years ago

Do you see this with gummiboot or GRUB? I see that GRUB indeed has this problem but from the code gummiboot should be okay.

nagisa commented 7 years ago

I’m using the systemd bootloader (I believe that is gummiboot).

jpotier commented 7 years ago

I'm having this with GRUB. I removed an old kernel by hand (freeing about 20MB) and then nixos-rebuild boot worked properly.

abbradar commented 7 years ago

@nagisa I've tested this with systemd-bootloader on my local machine -- it seems to correctly remove old entries. Can you repeat my experiment?

  1. Move one of kernels in /boot to say /tmp;
  2. dd if=/dev/zero of=/boot/EFI/nixos/foo.efi (to fill disk space with a bogus "kernel");
  3. nixos-rebuild switch.

After it finishes foo.efi should be deleted correctly and the moved kernel should appear again, without disk space errors.

@jpotier I'll try to prepare a patch (I'm not very familiar with Perl but it looks straightforward).

abbradar commented 7 years ago

I've hopefully fixed GRUB issue -- please test #26165.

nagisa commented 7 years ago

@abbradar just hit it without doing anything special. Just an --upgrade with change from linuxKernel4_10 to linuxKernel4_11.

building path(s) ‘/nix/store/aai4w304vnkqr8g7q1fb4gnmvxphd2qc-dbus-1’
building path(s) ‘/nix/store/83gqr3n9b6i1i7zz20l1vs17y5b01fqd-unit-polkit.service’
building path(s) ‘/nix/store/y0h3xjxfr3z3q6vgj7rd05swdx3iwx2l-unit-systemd-fsck-.service’
building path(s) ‘/nix/store/8qadphzyxbazqgi5xsv3lcjdx73sb1ws-unit-dbus.service’
building path(s) ‘/nix/store/dj5g33zpdynjj71m9ikhl91c88113hgv-system-units’
building path(s) ‘/nix/store/jjhri8hhvg9mdnjg65yar8gbkgd8dw3b-user-units’
building path(s) ‘/nix/store/vidj2plybp70blj82sfia8a1x6w5p35j-etc’
building path(s) ‘/nix/store/hynp02zx2h8r7ggqk1zf72gja1v5jf5b-nixos-system-shirobox-17.09pre108282.53835c93cb’
Traceback (most recent call last):
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 160, in <module>
    main()
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 147, in main
    write_entry(gen, machine_id)
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 52, in write_entry
    initrd = copy_from_profile(generation, "initrd")
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 47, in copy_from_profile
    copy_if_not_exists(store_file_path, "/boot%s" % (efi_file_path))
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 17, in copy_if_not_exists
    shutil.copyfile(source, dest)
  File "/nix/store/c6j3ky32czxaqy41i9xqm2qh1ys5kixv-python3-3.6.1/lib/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/nix/store/c6j3ky32czxaqy41i9xqm2qh1ys5kixv-python3-3.6.1/lib/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
warning: error(s) occurred while switching to the new configuration
shirobox :: /tmp  
▪df -h
Filesystem        Size  Used Avail Use% Mounted on
devtmpfs          804M     0  804M   0% /dev
tmpfs             7.9G     0  7.9G   0% /dev/shm
tmpfs             4.0G  5.1M  4.0G   1% /run
tmpfs             7.9G  360K  7.9G   1% /run/wrappers
rpool/root/nixos  234G   49G  185G  21% /
tmpfs             7.9G     0  7.9G   0% /sys/fs/cgroup
tmpfs             7.9G  496K  7.9G   1% /tmp
rpool/home        193G  8.1G  185G   5% /home
/dev/sdb1         100M  100M  2.0K 100% /boot
tmpfs             1.6G     0  1.6G   0% /run/user/1000
nagisa commented 7 years ago

I tried running all of the nix-env --delete-generations old && nix-collect-garbage -d which usually helped with my /boot woes, but this time it neglected to get rid of the files in /boot for some reason (it used to help before). Here are the files.

-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 16x2rs5xmk251q8wn504fxhl8fi541p7-linux-4.11.3-bzImage.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 4pimpvaqylk703069z5fld1ihfa8jr9p-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 9yivpgx2mjap0qr2xvdawia5gd7d1k9f-initrd-initrd.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 apzjnr4r3jxlgjhjq6p6wp3rjz419yz9-linux-4.10.12-bzImage.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 b368pwmjwkkqcszmsa94x2frgqpgbx5s-linux-4.10.15-bzImage.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 ckkdipm3l32z5kk7vdaxy84m62snwi7w-linux-4.10.12-bzImage.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 d8i3vpc7v5253ryz2c5ry7ghnbvb5pqq-initrd-initrd.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 h7qgnny008h1k9yplymqx0asrg3sx6kd-linux-4.10.13-bzImage.efi*
-rwxr-xr-x 1 root root 5.6M Jun  1 03:11 ja667wvnjri5wsgbl7227qwlgzsnhdvn-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 k8xy3rsvjkjfhqi17qfzlqzwafnl6jg9-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 kwbzcflgmd4jn6w4fprxhf20plbj3brn-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 wy7jm4dwr5hvav8qkiqadnp3hsj96ibj-initrd-initrd.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 zrifs96767ixklsr2w4ykp0fwdw2g21v-linux-4.10.13-bzImage.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 zxym3scw3mj0xpqk0glgk8ln18l55mzh-linux-4.10.10-bzImage.efi*
cleverca22 commented 7 years ago

nix-env and nix-collect-garbage doesn't clean up /boot, you have to re-run the install-grub.pl script (via nixos-rebuild switch/boot), which will update the /boot folder

nagisa commented 7 years ago

@cleverca22 does that apply for the systemd-boot, though?

cleverca22 commented 7 years ago

for systemd-boot, its this script: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/boot/loader/systemd-boot/systemd-boot-builder.py

it still runs the same way, via nixos-rebuild switch/boot

abbradar commented 7 years ago

@nagisa That's strange, it fails after removing old entries. What does sudo nix-env --list-generations -p /nix/var/nix/profiles/system report?

Regarding a need to run sudo nix-collect-garbage -d to clean up space -- this is expected, you'd need to indicate that you don't need old kernels by removing profiles that are associated with them.

nagisa commented 7 years ago

Oh I see. My problem is that I tend to forget to clean up my old generations. Is there a way to remove all the generations except the one I’m currently booted into and the newest one as a part of nixos-rebuild?

abbradar commented 7 years ago

@nagisa Not now but that'd be relatively trivial to fix -- please open an issue.

chris-martin commented 7 years ago

I have 415M of -initrd and -bzImage files in /boot/kernels. It seems to have been growing streadily and is now just about out of space. I've deleted all old generations and run nixos-rebuild. How can I clean this out?

My config looks like this:

  boot.initrd.luks.devices = [{
    name   = "root";
    device = "/dev/nvme0n1p3";
    preLVM = true;
  }];

  boot.loader.grub.device = "/dev/nvme0n1";
  boot.loader.systemd-boot.enable = false;

  boot.cleanTmpDir = true;
abbradar commented 7 years ago

I think it should be have been cleared after nixos-rebuild if you have deleted your old generations. This is not the case, correct? Can you show your /nix/var/nix/profiles/ contents?

chris-martin commented 7 years ago

@abbradar Ah nevermind, I wasn't running nix-collect-garbage as root, so I hadn't actually deleted old generations. (By the way, it's weird that --delete-older-than just silently fails if you try to use it as a non-root user.) Apologies, this is entirely unrelated to this issue.

joepie91 commented 7 years ago

By the way, it's weird that --delete-older-than just silently fails if you try to use it as a non-root user.

Could you create a separate issue for this? I think it might be desirable to have it at least print a warning (in case the user expected to be garbage-collecting the system environment, not just the user environment).

chris-martin commented 7 years ago

@joepie91, opened https://github.com/NixOS/nix/issues/1492

ghost commented 7 years ago

I tend not to garbage collect that often, so my efi partition had always overfilled quickly. Since systemd-boot is merely a efi boot menu and not full-blown loader, I've switched to grub and configured it to store only its efi program on a efi partition and reside everything else on a root partition (or whatever you'd like). This way kernels, initrds and grub stuff are put on a large partition and efi partition stores only grub efi binary of 122Kb :smile:

So here's my setup.

In /etc/nixos/hardware-configuration.nix I've replaced fileSystems."/boot" with fileSystems."/boot/efi"

/etc/nixos/configuration.nix became this:

  #boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  boot.loader.efi.efiSysMountPoint = "/boot/efi";
  boot.loader.grub = {
    efiSupport = true;
    #efiInstallAsRemovable = true; # in case canTouchEfiVariables doesn't work for your system
    device = "nodev";
  };

P.S.: nixos-generate-config already hints this approach, but only for BIOS mode systems currently (I believe it's for BIOS->EFI migration purpose): https://github.com/NixOS/nixpkgs/blob/6c8b819c99a85276f9d3ebdefdb039235321c646/nixos/modules/installer/tools/nixos-generate-config.pl#L536

timsears commented 7 years ago

I failed to garbage collect as root often enoufh and ended up with more than a few kernels in /boot/kernels. It's easy to see which are the old ones. After moving them out to another partition just in case. Something like this worked for me...

cd /boot/kernels
sudo mv lmnsg5sh081zdgr6rrwhhzdkyj0v7ibp-linux-4.9.25-bzImage /tmp/
sudo touch  lmnsg5sh081zdgr6rrwhhzdkyj0v7ibp-linux-4.9.25-bzImage
# a couple more of the previous two lines as needed
sudo nixos-rebuild switch

If you skip the touch then nixos-rebuild will regenerate the missing files before the point where the new boot configuration file is generated and it won't work. Your partition will still be full. The zero size files do the trick.

As an aside I tried to resize the /boot partition, but gparted failed to make the extra space visible to the fat32 partion that /boot is on. Maybe a later version of gparted will enable this approach to work.

rick68 commented 7 years ago

I found this mail list -- https://nixos.org/nix-dev/2016-September/021832.html

After I run this command,

/run/current-system/bin/switch-to-configuration boot

I got back my /boot space.

rubenmoor commented 7 years ago

Hm ... running /run/current-system/bin/switch-to-configuration boot results in

OSError: [Errno 28] No space left on device

for me.

Pauan commented 6 years ago

I just now ran into this when doing sudo nixos-rebuild switch --upgrade:

building Nix...
building the system configuration...
Traceback (most recent call last):
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 210, in <module>
    main()
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 197, in main
    write_entry(*gen, machine_id)
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 81, in write_entry
    kernel = copy_from_profile(profile, generation, "kernel")
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 57, in copy_from_profile
    copy_if_not_exists(store_file_path, "/boot%s" % (efi_file_path))
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 21, in copy_if_not_exists
    shutil.copyfile(source, dest)
  File "/nix/store/wx2jazwszwyqpwqj4ghkwn19n1h1ncva-python3-3.6.3/lib/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/nix/store/wx2jazwszwyqpwqj4ghkwn19n1h1ncva-python3-3.6.3/lib/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
warning: error(s) occurred while switching to the new configuration

I have plenty of space on every partition, except for the /boot partition which is completely full.

Setting boot.loader.grub.configurationLimit to a lower number didn't help.

After deleting old generations using sudo nix-collect-garbage --delete-older-than 60d and then running sudo nixos-rebuild switch it now works.

ghost commented 6 years ago

Why not just switch to grub which can store kernels, initrds and other things on root partition instead of small efi partition? https://nixos.wiki/wiki/Bootloader#How_to_deal_with_full_.2Fboot_in_case_of_EFI

CMCDragonkai commented 6 years ago

The challenge has always been synchronising multiple ESPs. I would want something redundant, because I have 2 drives, but I also want them to be synchronised, but only if the boot succeeded. I hacked up something using activation scripts and systemd script, but it's quite brittle.

nagisa commented 6 years ago

@gnidorah I’m personally using efi-stub, which grub does not support.

equalunique commented 6 years ago

@rubenmoor on my UEFI system had success with the command supplied by @rick68 only after I ran it as sudo. Cleared out 250MB+ of old files from /boot/EFI/nixos/

vcunat commented 6 years ago

Yes, system-changing commands need to be ran with root privileges. EDIT: I'm afraid even the documentation doesn't say such things explicitly.

arianvp commented 6 years ago

I just ran into this whilst upgrading to 18.09 and not sure how to "Fix" this. Can I manually safely remove old kernels?

arianvp commented 5 years ago

I again ran into this whilst upgrading from linuxPackages_4_18 to linuxPackages_4_19 and again stuck with a broken system. We should really tackle this, it's very bad UX.

arianvp commented 5 years ago

If anybody runs into this:

run

nixos-rebuild boot

instead of

nixos-rebuild switch

and it will remove the old kernels

CMCDragonkai commented 5 years ago

Can there be a configurationLimit option available to systemd-boot as well?

domenkozar commented 5 years ago

I've pushed 224a6562a4880195afa5c184e755b8ecaba41536 to master which adds boot.loader.systemd-boot.configurationLimit exactly as the existing one for grub.

Current status:

Possible TODOs:

domenkozar commented 5 years ago

Backport: https://github.com/NixOS/nixpkgs/pull/63766

ashkan-leo commented 4 years ago

I'm facing this issue on my raspberry pi 3. Basically, the /boot partition is so small (only 30m) that I can't even build a single generation. Any suggestion on how to fix this?

SimonAlling commented 4 years ago

I'm facing this issue on my raspberry pi 3. Basically, the /boot partition is so small (only 30m) that I can't even build a single generation. Any suggestion on how to fix this?

Same on my RPi3.

$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1  120M  120M     0 100% /boot

Most of the space is taken up by /boot/nixos:

$ du -sch /boot/nixos/*
4.1M    /boot/nixos/3n2b8fvv6xaqpdccfcgg2z7sp28sz4j8-initrd-initrd
2.9M    /boot/nixos/508hwfj07vqvla0l27g3x5ync5mschzv-linux-4.14.10-dtbs
26M /boot/nixos/508hwfj07vqvla0l27g3x5ync5mschzv-linux-4.14.10-Image
4.1M    /boot/nixos/aibyylj1h4bim45zgw2s0gwcsvfadk34-initrd-initrd
6.6M    /boot/nixos/bwgjnapvj32i9x7g35mp86567awxf9lq-initrd-initrd
4.0M    /boot/nixos/j9gqa43mayrl7j9mmpjl0hyii4yb49mn-linux-4.19.42-dtbs.tmp.6140
29M /boot/nixos/j9gqa43mayrl7j9mmpjl0hyii4yb49mn-linux-4.19.42-Image
2.5M    /boot/nixos/rk340hsfa1mc9x6q7f3yr1m5lr73kfb1-linux-4.13.2-dtbs
25M /boot/nixos/rk340hsfa1mc9x6q7f3yr1m5lr73kfb1-linux-4.13.2-Image
4.4M    /boot/nixos/vcd0w3n8qqvgnb5ic950nq8z3mnbi86w-initrd-initrd
108M    total

None of the suggested commands work for me, e.g.:

$ sudo nixos-rebuild boot
building Nix...
building the system configuration...
cat: write error: No space left on device
warning: error(s) occurred while switching to the new configuration

I resorted to deleting a couple of old items, freeing up some 30 MB:

$ cd /boot/nixos
$ sudo rm -rf rk340hsfa1mc9x6q7f3yr1m5lr73kfb1-linux-4.13.2-*

And now sudo nixos-rebuild switch works. :slightly_smiling_face:

n8henrie commented 4 years ago

For the Raspberry Pi folks chiming in: boot.loader.generic-extlinux-compatible.configurationLimit should probably be set lower (looks like it's the corrollary to the grub setting people mention above).

You will likely need to manually delete some files in /boot/nixos/ before you can sudo nixos-rebuild switch -- perhaps look for names that suggest part of an older linux kernel.

Unfortunately it looks like the total size of a single boot configuration is ~50M, so my 130MB boot partition probably won't do much good.

$ ls /boot/nixos
66i7fz4ssgh90pw352qm1wd6yig7k1z3-linux-5.6.12-dtbs  66i7fz4ssgh90pw352qm1wd6yig7k1z3-linux-5.6.12-Image  6cw641p81man2k0p4iavwbvwf8j5pzzd-initrd-linux-5.6.12-initrd
$ du -sh /boot/nixos/
53M /boot/nixos/

On the other hand, it looks like the wiki has been updated to recommend against using a separate boot partition on NixOS >= 19.09:

# File systems configuration for using the installer's partition layout
  fileSystems = {
    # Prior to 19.09, the boot partition was hosted on the smaller first partition
    # Starting with 19.09, the /boot folder is on the main bigger partition.
    # The following is to be used only with older images.
    /*
    "/boot" = {
      device = "/dev/disk/by-label/NIXOS_BOOT";
      fsType = "vfat";
    };
    */
    "/" = {
      device = "/dev/disk/by-label/NIXOS_SD";
      fsType = "ext4";
    };
  };

Also see the section on the same page: Disable use of /boot partition

ShamrockLee commented 4 years ago

Is there a workaround? I don't know which .efi files are belong to deleted generations, so even don't know what can be delete manually.

nixos-rebuild boot doesn't work on my machine with a full /boot partition.

UPDATE: My fault. It seems that I need to delete even more generations to get enough space. After deleting enough system generations, backup /boot/EFI/nixos/ somewhere else, clear up all the .efi files inside /boot/EFI/nixos/, and do nix-rebuild boot. It will generate the .efi's for all the remaining entries, including the newly-built one.

Atemu commented 4 years ago

The EFI files shouldn't take much space, you would rather be looking for old kernel images and initrds.

You can check ls -l /nix/var/nix/profiles/system-*/kernel for the hashes of files only in generations that have been deleted.

aij commented 4 years ago

You can check ls -l /nix/var/nix/profiles/system-*/kernel for the hashes of files only in generations that have NOT been deleted.

Fixed that for you.

FWIW, I usually just go by kernel version numbers rather than trying to match up hashes. Every time I filled up /boot I had enough ancient kernels and initrds lying around that it was easy to find some I knew I wouldn't need any more.

ShamrockLee commented 4 years ago

The EFI files shouldn't take much space, you would rather be looking for old kernel images and initrds.

You can check ls -l /nix/var/nix/profiles/system-*/kernel for the hashes of files only in generations that have been deleted.

Thanks a lot! They are ...-bzImage.efi and ...-initrd.efi.

Xe commented 4 years ago

I ran into this today, but /boot was somehow not mounted at all.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

Atemu commented 3 years ago

Still important to me.