NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.32k stars 14.29k forks source link

coreutils not on scope at install-grub.pl #241356

Open Mikilio opened 1 year ago

Mikilio commented 1 year ago

Describe the bug

Following bug occurs in install-grub.pl when executing the following command

$ sudo --preserve-env=PATH,NIX_PATH `which nixos-install` --root /mnt --flake /mnt/etc/nixos#homestation

Coreutils functions are not in scope (finishes install when executing in nix develop nixpkgs#coreutils but result doesn't boot).

Steps To Reproduce

Steps to reproduce the behavior:

  1. run my script and choose an empty drive on your system
  2. run
    sudo git clone https://github.com/Mikilio/dotfiles.git /mnt/etc/nixos
  3. run
    $ sudo --preserve-env=PATH,NIX_PATH `which nixos-install` --root /mnt --flake /mnt/etc/nixos#homestation

Expected behavior

It should just finish installation and new disk should be able to boot.

Examples

Output:

installing the boot loader...
setting up /etc...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
updating GRUB 2 menu...
/nix/var/nix/profiles/system/sw/bin/bash: line 10: rmdir: command not found

Additional context

Installing from Non-NixOS system (Fedora) The exact commit of my config when this bug occured is 4b14be37e623395232eee63504ca42050385135a

Notify maintainers

@xaverdh @rnhmjoj

Metadata

error: file 'nixpkgs' was not found in the Nix search path (add it using $NIX_PATH or -I)

       at «string»:1:25:

            1| {...}@args: with import <nixpkgs> args; (pkgs.runCommandCC or pkgs.runCommand) "shell" { buildInputs = [ (nix-info) ]; } ""
             |                         ^
(use '--show-trace' to show detailed location information)
xaverdh commented 1 year ago

I can't find any direct reference to this in your config, but probably one of the external flakes that you use, sets boot.loader.grub.extraInstallCommands and uses rmdir instead of ${lib.getBin coreutils}/bin/rmdir?

PS: With a recent enough nix, you can set nix.nixPath = [ "nixpkgs=flake:nixpkgs" ] to get a working NIX_PATH for the old tooling

xaverdh commented 1 year ago

Just to clarify: install-grub.pl is not to blame here, since it is written in perl, and does not explicitly shell out to rmdir. Instead the error originates from the shell script that executes install-grub.pl I think.

xaverdh commented 1 year ago

Since its all scripts in the store, can you try and check which piece of code exactly fails (starting with which nixos-install and chasing references)?

Mikilio commented 1 year ago

First I'd like to apologize for my late response.

I remember trying to track the error and found that install-grub.pl is the only place where it happens.

I will look again when I come back to my deployment script. Back then I worked around the error by using systemd-boot.

When I come back to this error I am not sure if I will be able to reproduce this error because my fedora installation does not exist anymore.

ISibboI commented 1 year ago

I am having the same problem when following this guide to install nixos on a dedicated server: https://github.com/nix-community/nixos-install-scripts/blob/master/hosters/hetzner-dedicated/hetzner-dedicated-wipe-and-install-nixos.sh

I have modified the script to include pkgs.coreutils in the installation environment, but it did not help.

My `configuration.nix` ``` { config, pkgs, ... }: { imports = [ # Include the results of the hardware scan. ./hardware-configuration.nix ]; # Use GRUB2 as the boot loader. # We don't use systemd-boot because Hetzner uses BIOS legacy boot. boot.loader.systemd-boot.enable = false; boot.loader.grub = { enable = true; efiSupport = false; devices = [ "/dev/sda" "/dev/sdb" ]; }; networking.hostName = "hetzner"; # The mdadm RAID1s were created with 'mdadm --create ... --homehost=hetzner', # but the hostname for each machine may be different, and mdadm's HOMEHOST # setting defaults to '' (using the system hostname). # This results mdadm considering such disks as "foreign" as opposed to # "local", and showing them as e.g. '/dev/md/hetzner:root0' # instead of '/dev/md/root0'. # This is mdadm's protection against accidentally putting a RAID disk # into the wrong machine and corrupting data by accidental sync, see # https://bugzilla.redhat.com/show_bug.cgi?id=606481#c14 and onward. # We do not worry about plugging disks into the wrong machine because # we will never exchange disks between machines, so we tell mdadm to # ignore the homehost entirely. environment.etc."mdadm.conf".text = '' HOMEHOST ''; # The RAIDs are assembled in stage1, so we need to make the config # available there. boot.initrd.services.swraid.mdadmConf = config.environment.etc."mdadm.conf".text; # Network (Hetzner uses static IP assignments, and we don't use DHCP here) networking.useDHCP = false; networking.interfaces."enp0s31f6".ipv4.addresses = [ { address = ""; # FIXME: The prefix length is commonly, but not always, 24. # You should check what the prefix length is for your server # by inspecting the netmask in the "IPs" tab of the Hetzner UI. # For example, a netmask of 255.255.255.0 means prefix length 24 # (24 leading 1s), and 255.255.255.192 means prefix length 26 # (26 leading 1s). prefixLength = 26; } ]; networking.interfaces."enp0s31f6".ipv6.addresses = [ { address = ""; prefixLength = 64; } ]; networking.defaultGateway = ""; networking.defaultGateway6 = { address = "fe80::1"; interface = "enp0s31f6"; }; networking.nameservers = [ "8.8.8.8" ]; # Initial empty root password for easy login: users.users.root.initialHashedPassword = ""; services.openssh.settings.PermitRootLogin = "prohibit-password"; users.users.root.openssh.authorizedKeys.keys = [ # FIXME Replace this by your SSH pubkey! "ssh-rsa " ]; services.openssh.enable = true; # FIXME # This value determines the NixOS release with which your system is to be # compatible, in order to avoid breaking some software such as database # servers. You should change this only after NixOS release notes say you # should. system.stateVersion = "23.05"; # Did you read the comment? } ```

Running nixos-install results in the following:

$ PATH="$PATH" `which nixos-install` --no-root-passwd --root /mnt --max-jobs 40
++ which nixos-install
+ PATH=/root/.nix-profile/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ /root/.nix-profile/bin/nixos-install --no-root-passwd --root /mnt --max-jobs 40
building the configuration in /mnt/etc/nixos/configuration.nix...
/nix/store/c17y1bb883fyxxgjp1v0f3f9sijnh9a9-nixos-system-hetzner-23.05.2209.ac1acba43b2
installing the boot loader...
setting up /etc...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/sda...
Installing for i386-pc platform.
Installation finished. No error reported.
installing the GRUB 2 boot loader on /dev/sdb...
Installing for i386-pc platform.
Installation finished. No error reported.
/nix/var/nix/profiles/system/sw/bin/bash: line 10: rmdir: command not found

Edit 1: I tried executing the nixos-install command with -vvvvvvvvvv, but there is no more information on this error. If I instead add --keep-going, it anyways stops after the missing rmdir command, and e.g. the ssh key of root is not added correctly.

Edit 2: Tracing the error:

ISibboI commented 1 year ago

@xaverdh @rnhmjoj is there anything more I can provide to help resolve this? I would also be interested in fixing it myself via a pull request, but I would need some guidance, since this would be my first contribution to NixOS.

Mikilio commented 1 year ago

@xaverdh @rnhmjoj is there anything more I can provide to help resolve this? I would also be interested in fixing it myself via a pull request, but I would need some guidance, since this would be my first contribution to NixOS.

I think a good point to start would be to follow the instructions originally directed at me.

Run the scripts manually (or mark all with set -x) to see which one it actually fails.

A good start already is to have a relatively simple setup to reproduce the error, like you do.

ISibboI commented 1 year ago

Thanks for the guidance! I have now changed line 209 in /nix/store/ka7qp6wsjzss72x0zrswh7rp42f4rv97-nixos-install/bin/nixos-install from umount -R "$mountPoint" && rmdir "$mountPoint" to umount -R "$mountPoint" && echo "now doing rmdir in nixos-install" && rmdir "$mountPoint" && echo "rmdir in nixos-install was successful"

Now my log ends with:

now doing rmdir in nixos-install
/nix/var/nix/profiles/system/sw/bin/bash: line 10: rmdir: command not found

So clearly, the offending rmdir is the one in nixos-install at line 209.

How do I get coreutils in scope for that one?

xaverdh commented 1 year ago

You can add pkgs.coreutils here

Mikilio commented 1 year ago

You can add pkgs.coreutils here

I feel like I remember wanting to do this before and you insisted on using full paths like so /nix/var/nix/profiles/system/sw/bin/rmdir now I don't know what the reason was back then but maybe this applies here as well?

rnhmjoj commented 1 year ago

The script is indeed missing a coreutils dependency. I'm not sure why it would fail on that line, though. There are references to dirname, touch, mkdir, install, rm (all in coreutils) before that line... It has probably something to do with nixos-enter.

ISibboI commented 1 year ago

Adding pkgs.coreutils at the point propsed by xaverdh seems to do nothing. It still fails while complaining that rmdir does not exist. But I am very confused, because in my fork, the line with the rmdir contains this: (rmdir "$mountPoint" 2>/dev/null || true). Does || not apply to a command not being found?

xaverdh commented 1 year ago

/nix/var/nix/profiles/system/sw/bin/bash: line 10: rmdir: command not found ah that probably means that we are trying to find rmdir in that location inside the chroot mount namespace, not in PATH. So the modifications to PATH do not propagate there.

xaverdh commented 1 year ago

The script is indeed missing a coreutils dependency. I'm not sure why it would fail on that line, though. There are references to dirname, touch, mkdir, install, rm (all in coreutils) before that line... It has probably something to do with nixos-enter.

The $system/sw/bin/bash -c part is executed in chroot while the other stuff is not. That is why the other commands don't fail I think

xaverdh commented 1 year ago

So I can think of two possible solutions:

rnhmjoj commented 1 year ago

But why would coreutils not be in the PATH of the chroot?

xaverdh commented 1 year ago

You are right, it does take the indirection through PATH, but that points to /run/current-system/sw/bin. I misinterpreted that part. Still /run/current-system/sw/bin will be resolved in the changed root, so will not find the coreutils of the ambient system.

xaverdh commented 1 year ago

(I thought the issue was that chroot clears PATH, but that is not happening, I just checked)

Mikilio commented 1 year ago

Maybe it is not that the path is cleared, but rather that the path is not correct anymore in the new root.

xaverdh commented 1 year ago

Maybe it is not that the path is cleared, but rather that the path is not correct anymore in the new root.

Yes, that is what I meant

rncwind commented 1 year ago

I'm also having this issue with the same install script for a hetzner dedi. Was a workaround found for this, as this method of install is broken, and so is nixos-generators kexec-bundle making it very difficult to actually install nixos on hetzner right now.

xaverdh commented 1 year ago

So coreutils-full is part of requiredPackages and should therefore be part of the new system. So I don't quite see why it would fail, since the path should be the same (/run/current-system/sw/bin/rmdir) as in the ambient system. Can you check if the install succeeds when you replace rmdir by /run/current-system/sw/bin/rmdir?

rncwind commented 1 year ago

Yeah, that seems to have worked. Very odd how it's not picking it up from the system we are trying to install from. Thanks for the help, I hope this narrows it down a bit!

xaverdh commented 1 year ago

Wild guess would be that some form of caching by the shell happens, that makes it look for the resolved path from the ambient system. Anyway if that does happen to fix it, care to open a pr?

Animeshz commented 1 year ago

So is there a temporary workaround for this at the moment, if anybody was able to bypass the error?

PS: Nvm I sudo edited the script and hardcoded the path /nix/var/nix/profiles/system/sw/bin/rmdir there in nixos-install:209. I know not good to sudo edit cryptographically hashed nix path, but anyway, it works till the moment things are installed.

poliorcetics commented 3 weeks ago

I'm having the same issue with mount and not rmdir: there was previously an issue for it there: https://github.com/NixOS/nixpkgs/issues/220211, and it seems I am encountering it again

I tried writing a patch for it (adding an echo "POLIORCETICS: going to call 'mount' inside the call to nixos-enter)

    overlays = [
      (final: super: {
        nixos-install-tools = super.nixos-install-tools.overrideAttrs (prev: {
          patches = (prev.patches or [ ]) ++ [ ./nixos-install.patch ];
        });
      })
    ];

but that didn't pick up the patch

The error I'm getting looks like this:


setting up /etc...
/nix/var/nix/profiles/system/sw/bin/bash: line 7: mount: command not found
/nix/var/nix/profiles/system/sw/bin/bash: line 8: mount: command not found```
Mikilio commented 3 weeks ago

I feel like there is not much interest to fix this installation method as for most people it's easier to install using from a bootable installer instead. I don't think this issue will gain that much attention any time soon, and the best thing to do is probably to ensure that this issue is discoverable, that it's installation method is not well-supported, and that the workaround to this gets highlighted.

Instead of trying to fix this, I probably still recommend using a NixOS installer image.

I would welcome any ideas on how to achieve the above.

n8henrie commented 2 weeks ago

Was also getting mount: command not found when trying to nixos-install from an x86 Arch linux onto an aarch64-linux drive.

Prefixing the mount commands (in the nixos-enter call) with /run/current-system/sw/bin/ seems to have worked.

xaverdh commented 2 weeks ago

can someone check if adding

hash -r

here before the set -e works?

(e.g. by locally copying and modifying the script, as was done before by ppl in this thread)

xaverdh commented 2 weeks ago

If that does work, I can open a pull request

xaverdh commented 2 weeks ago

draft pr is here: https://github.com/NixOS/nixpkgs/pull/355269 You can alternatively try to use the nixos-install from there