NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.16k stars 14.19k forks source link

Nixos 23.05 builds but won't boot #248293

Closed wriver4 closed 1 year ago

wriver4 commented 1 year ago

Describe the bug

Nixos 23.05 builds but won't boot

Steps To Reproduce

Steps to reproduce the behavior: sudo nixos-build switch

Expected behavior

boot recent generation

Screenshots

● nixos State: maintenance Units: 239 loaded (incl. loaded aliases) Jobs: 0 queued Failed: 1 units Since: Wed 2023-08-09 21:20:44 EDT; 4h 6min ago systemd: 253.6 Tainted: cgroupsv1 CGroup: / ├─init.scope │ └─1 /run/current-system/systemd/lib/systemd/systemd --system --deserialize=256 └─system.slice ├─emergency.service │ ├─5646 /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/lib/systemd/systemd-sulogin-shell emergency │ ├─5647 bash │ └─5661 systemctl status ├─lxd.service │ └─1645 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --no-negcache --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.141.21.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/lib/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/lib/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.141.21.2,10.141.21.254,1h --listen-address=fd42:31bd:fa68:7632::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd --interface-name _gateway.lxd,lxdbr0 -S /lxd/ --conf-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.raw -u nobody -g lxd ├─systemd-journald.service │ └─499 /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/lib/systemd/systemd-journald ├─systemd-timesyncd.service │ └─676 /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/lib/systemd/systemd-timesyncd └─systemd-udevd.service └─udev └─524 /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/lib/systemd/systemd-udevd

░░ ░░ The job identifier is 3738. Aug 10 01:04:05 nixos (plymouth)[5645]: emergency.service: Executable /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/bin/plymouth missing, skipping: No such file or directory ░░ Subject: Process /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/bin/plymouth could not be executed ░░ Defined-By: systemd░░ ░░ Th ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ The process /nix/store/rpagyb9792jx4f2hlqz9q0ld3frlzxq5-systemd-253.6/bin/plymouth could not be executed and failed. ░░ ░░ The error number returned by this process is ERRNO.

System: Firmware: UEFI 2.31 (Lenovo 0.4224) Firmware Arch: x64 Secure Boot: disabled (disabled) TPM2 Support: firmware only, driver unavailable Boot into FW: supported

Current Boot Loader: Product: systemd-boot 253.6 Features: ✓ Boot counting ✓ Menu timeout control ✓ One-shot menu timeout control ✓ Default entry control ✓ One-shot entry control ✓ Support for XBOOTLDR partition ✓ Support for passing random seed to OS ✓ Load drop-in drivers ✓ Support Type #1 sort-key field ✓ Support @saved pseudo-entry ✓ Support Type #1 devicetree field ✓ Boot loader sets ESP information ESP: /dev/disk/by-partuuid/2d269f38-4adb-d74e-a0ea-dbb6691d9719 File: └─/EFI/systemd/systemd-bootx64.efi

Random Seed: System Token: set Exists: yes

Available Boot Loaders on ESP: ESP: /boot (/dev/disk/by-partuuid/2d269f38-4adb-d74e-a0ea-dbb6691d9719) File: ├─/EFI/systemd/systemd-bootx64.efi (systemd-boot 253.6) └─/EFI/BOOT/BOOTX64.EFI (systemd-boot 253.6)

Boot Loaders Listed in EFI Variables: Title: Linux Boot Manager ID: 0x0014 Status: active, boot-order Partition: /dev/disk/by-partuuid/2d269f38-4adb-d74e-a0ea-dbb6691d9719 File: └─/EFI/systemd/systemd-bootx64.efi

Boot Loader Entries: $BOOT: /boot (/dev/disk/by-partuuid/2d269f38-4adb-d74e-a0ea-dbb6691d9719) token: nixos

Default Boot Loader Entry: type: Boot Loader Specification Type #1 (.conf) title: NixOS (Generation 37 NixOS 23.05.2478.bd836ac5e5a7, Linux Kernel 6.1.42, Built on 2023-08-09) id: nixos-generation-37.conf source: /boot//loader/entries/nixos-generation-37.conf version: Generation 37 NixOS 23.05.2478.bd836ac5e5a7, Linux Kernel 6.1.42, Built on 2023-08-09 machine-id: a6c87e7406144e4f817bcc24ce2c0264 linux: /boot//efi/nixos/skqljc4db5n96rmls2vz0929dp65px8c-linux-6.1.42-bzImage.efi initrd: /boot//efi/nixos/x5lwlrwfjkphjazp375xc31zhbsmygij-initrd-linux-6.1.42-initrd.efi options: init=/nix/store/95c1ach48k6pi6xi88pby7pqs5mgkv2h-nixos-system-nixos-23.05.2478.bd836ac5e5a7/init loglevel=4

Additional context

Getting pretty good at removing failed future generations. Saw another similar issue #223579. It was mostly ZFS based and this is ext4. I am writing this on the machine that builds but won't boot using an older generation.

Notify maintainers

Nixos Maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
mark@nixos:~]$ nix-shell -p nix-info --run "nix-info -m"
these 11 paths will be fetched (53.19 MiB download, 241.63 MiB unpacked):
  /nix/store/2jgnizza14z2mz2jfm098gbvvl4aysil-binutils-2.40-lib
  /nix/store/b634rxg4s74sj9ac6dijy8v6ssfrf390-gmp-6.2.1
  /nix/store/b7hvml0m3qmqraz1022fwvyyg6fc1vdy-gcc-12.2.0
  /nix/store/cwl4ylj9rps9alf2z51s8l32ivvc50yv-isl-0.20
  /nix/store/gqs234r2zdmlxvs6jzk8w70v68mxd62f-libmpc-1.3.1
  /nix/store/kv3p3lh74azm5z1s97xp08p6x35zq9qd-expand-response-params
  /nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0
  /nix/store/lp72fysfjmrid1zsgk4k43lg3smycgn7-mpfr-4.2.0
  /nix/store/qnjzh4b0zgdkpb9x3r3h3bnc3rhdysbx-binutils-wrapper-2.40
  /nix/store/vfdg65hiv4bwls48588msw8la7452w2q-stdenv-linux
  /nix/store/zkjq96ik8cbv6ijh1lylnkk2bni9qvas-binutils-2.40
copying path '/nix/store/kv3p3lh74azm5z1s97xp08p6x35zq9qd-expand-response-params' from 'https://cache.nixos.org'...
copying path '/nix/store/b634rxg4s74sj9ac6dijy8v6ssfrf390-gmp-6.2.1' from 'https://cache.nixos.org'...
copying path '/nix/store/2jgnizza14z2mz2jfm098gbvvl4aysil-binutils-2.40-lib' from 'https://cache.nixos.org'...
copying path '/nix/store/cwl4ylj9rps9alf2z51s8l32ivvc50yv-isl-0.20' from 'https://cache.nixos.org'...
copying path '/nix/store/lp72fysfjmrid1zsgk4k43lg3smycgn7-mpfr-4.2.0' from 'https://cache.nixos.org'...
copying path '/nix/store/zkjq96ik8cbv6ijh1lylnkk2bni9qvas-binutils-2.40' from 'https://cache.nixos.org'...
copying path '/nix/store/gqs234r2zdmlxvs6jzk8w70v68mxd62f-libmpc-1.3.1' from 'https://cache.nixos.org'...
copying path '/nix/store/b7hvml0m3qmqraz1022fwvyyg6fc1vdy-gcc-12.2.0' from 'https://cache.nixos.org'...
copying path '/nix/store/qnjzh4b0zgdkpb9x3r3h3bnc3rhdysbx-binutils-wrapper-2.40' from 'https://cache.nixos.org'...
copying path '/nix/store/lcf37pgp3rgww67v9x2990hbfwx96c1w-gcc-wrapper-12.2.0' from 'https://cache.nixos.org'...
copying path '/nix/store/vfdg65hiv4bwls48588msw8la7452w2q-stdenv-linux' from 'https://cache.nixos.org'...
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.42, NixOS, 23.05 (Stoat), 23.05.2478.bd836ac5e5a7`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.3`
 - channels(root): `"nixos-23.05"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
Artturin commented 1 year ago

In 2021 I closed a similar issue https://github.com/NixOS/nixpkgs/issues/141801 Maybe the systemd.../bin/plymouth is just a red herring because it'll get run when the emergency service is ran?

ElvishJerricco commented 1 year ago

Yes, the plymouth message is a red herring caused by this: https://github.com/systemd/systemd/blob/9b5560f39c619a101044c152a85c6bd1b8978f3c/units/emergency.service.in#L22

It's got a - prefix so it's a non-issue. Just a slightly awkward quirk in systemd.

ElvishJerricco commented 1 year ago

@wriver4 We're going to need to know which unit actually failed. I believe systemctl list-units --failed should list anything that failed and lead to emergency mode. If not, we'll need to see more logs.

wriver4 commented 1 year ago

UNIT LOAD ACTIVE SUB DESCRIPTION ● var-lib-lxcfs.mount loaded failed failed /var/lib/lxcfs

LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 1 loaded units listed.

for future reference what logs would you like to see? FWIW Only been nixing for a couple weeks. Should of caught that one though.

ElvishJerricco commented 1 year ago

That seems odd. It's a mount unit that's failing? It'd be useful to see systemctl status var-lib-lxcfs.mount. Also please share any part of your config having to do with lxcfs, including anything in hardware-configuration.nix.

I have a hypothesis. It looks like virtualisation.lxc.lxcfs creates a systemd service that starts off by mounting a file system. I wonder if you did a nixos-generate-config to regenerate hardware-configuration.nix while this file system was mounted, creating a fileSystems."/var/lib/lxcfs" entry, which would likely create failures during boot.

wriver4 commented 1 year ago

UNIT LOAD ACTIVE SUB DESCRIPTION ● var-lib-lxcfs.mount loaded failed failed /var/lib/lxcfs

LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 1 loaded units listed.

for future reference what logs would you like to see? FWIW Only been nixing for a couple weeks. Should of caught that one though.

wriver4 commented 1 year ago

systemctl status var-lib-lxcfs.mount service could not be found nixos config.nix,saved

Container lxd lxc

virtualisation.lxd.enable = true; virtualisation.lxc.lxcfs.enable = true; nixos config.nix current for troubleshooting

virtualisation.lxd.enable = true;

virtualisation.lxc.lxcfs.enable = true;

hardware.nix { config, lib, pkgs, modulesPath, ... }:

{ imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];

boot.initrd.availableKernelModules = [ "xhci_pci" "ehci_pci" "ahci" "sd_mod" "sdhci_pci" ]; boot.initrd.kernelModules = [ ]; boot.kernelModules = [ "kvm-intel" ]; boot.extraModulePackages = [ ];

fileSystems."/" = { device = "/dev/disk/by-uuid/98ac3fba-efe1-4377-9c52-4e9824ce94c2"; fsType = "ext4"; };

fileSystems."/boot" = { device = "/dev/disk/by-uuid/7070-4527"; fsType = "vfat"; };

fileSystems."/var/lib/lxcfs" = { device = "lxcfs"; fsType = "fuse.lxcfs"; };

fileSystems."/var/lib/lxd/shmounts" = { device = "tmpfs"; fsType = "tmpfs"; };

fileSystems."/var/lib/lxd/devlxd" = { device = "tmpfs"; fsType = "tmpfs"; };

fileSystems."/var/lib/lxd/storage-pools/default" = { device = "/var/lib/lxd/disks/default.img"; fsType = "btrfs"; options = [ "loop" ]; };

swapDevices = [ ];

.... skipped networking

nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux"; hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware; } So, I am guessing if I delete the lxcfs "mount" that may fix the boot issue . I am thinking it would be easier but not as much of a lesson to delete all the lx* stuff and start over. Thoughts?

ElvishJerricco commented 1 year ago

@wriver4 Please use github markdown formatting to put code and logs into code block formatting.

All those fileSystems entries with /var/lib/lxcfs or /var/lib/lxd in the path are definitely not supposed to be there. I strongly suspect my hypothesis is right, and you regenerated hardware-configuration.nix after enabling those things, which caused it to add fileSystems that shouldn't be there. This is a rather unfortunate quirk of things like this that make a bunch of extra mount points.

wriver4 commented 1 year ago

I did regenerate the hardware... nix but I can't be sure of the timing. Computing in the family living room with the dog, wife and daughter chatting, and startrek on so I am thinking you are correct. How did file system entries end up in the h---.nix?

wriver4 commented 1 year ago

I did regenerate the hardware... nix but I can't be sure of the timing. Computing in the family living room with the dog, wife and daughter chatting, and startrek on so I am thinking you are correct. How did file system entries end up in the h---.nix?

wriver4 commented 1 year ago

hypothesis confirmed! Boot successful. thank all of you for the help.

wriver4 commented 1 year ago

I'll use markdown from now on

Artturin commented 1 year ago

nixos-generate-config puts the current mounts in he config