NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.55k stars 1.5k forks source link

Bug: The build fails if a build machine/cache is offline #3514

Open NorfairKing opened 4 years ago

NorfairKing commented 4 years ago

Describe the bug

I set up my desktop computer as a build machine and binary cache for my laptop. when I turn off my desktop, every build on my laptop fails.

I can't tell if it is because the desktop is a build machine, or because it's a binary cache, but in both cases this should not be happening.

Steps To Reproduce

  1. Set up machine B as a build machine and binary cache for machine A
  2. Turn off machine B
  3. Run nix-build on machine A

Expected behavior

A builds everything itself.

nix-env --version output

$ nix-env --version
nix-env (Nix) 2.3.6
NorfairKing commented 4 years ago

@edolstra How would I go about pushing this forward?

zimbatm commented 4 years ago

A good first step is to follow the issue template: https://github.com/NixOS/nix/issues/new?assignees=&labels=bug&template=bug_report.md&title=

NorfairKing commented 4 years ago

@zimbatm Is this better?

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

NorfairKing commented 3 years ago

Still relevant.

zimbatm commented 3 years ago

@rickynils might be interested in pursuing this since he is working on nixbuild.net

ZoomRmc commented 3 years ago

Still relevant. In Nixos nix.binaryCaches is a list, so hard-failing on a first item being offline is a bug and completely counter-intuitive. Also, error message has to print a suggestion to use --option substituters (or any other currently accepted workaround).

ursi commented 3 years ago

@ZoomRmc can you expand on how to use --option substituters? I don't see anything about it in nixos-rebuild --help.

edit: I'll just add it myself since I found it elsewhere --option substitute false

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/ignore-offline-substituters/15450/4

afreakk commented 2 years ago

I can't tell if it is because the desktop is a build machine, or because it's a binary cache, but in both cases this should not be happening.

For me when remote builders are offline, that causes no issue, just a fast:

cannot build on 'ssh://build-user@superfastmachine.local': error: cannot connect to 'build-user@superfastmachine.local': ssh: Could not resolve hostname superfastmachine.local: Name or service not known

And then it continues to build on the local machine, but I previously used ip's instead of hostnames, and then it hung a lot longer before it continued to build on local machine.

But when using binary cache like this:

substituters = http://superfastmachine.local:5000/ https://cache.nixos.org/

in nix.conf, and my pc wants to download something from there, I get:

warning: error: unable to download 'http://superfastmachine.local:5000/wbjfdccsii8wcnawlgg1a72i2vazfg4b.narinfo': Couldn't resolve host name (6); retrying in 336 ms
disabling binary cache 'http://superfastmachine.local:5000' for 60 seconds
error: unable to download 'http://superfastmachine.local:5000/0wxn3wnk6qiv5kzl0w8abv9jzh8szgqz.narinfo': Couldn't resolve host name (6)
error: unexpected end-of-file

And I need to override --option substituters https://cache.nixos.org to exclude the unavailable binary cache to be able to finish the build. I thought fallback = true in nix.conf would help, but it did not.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

NorfairKing commented 2 years ago

unstale bot

tbidne commented 1 year ago

Still important.

tbidne commented 1 year ago

Related: https://github.com/NixOS/nix/issues/3796, https://github.com/NixOS/nix/issues/6901

arcuru commented 1 year ago

I'm pretty sure fallback = true is supposed to fix this, I use this exact setup locally. I'm not sure why that didn't work for @afreakk, maybe a bug that's been fixed now? You'll also need to set connect-timeout = 5 or something else low otherwise the build will hang for minutes, I talked about this in more detail here.

Also related is #7188, which should fix this without needing to set fallback = true.

johnhamelink commented 1 month ago

Setting fallback = true does indeed allow me to build, however this does trigger a stream of error: opening a connection to remote store 'ssh-ng://missing-server' previously failed messages. It'd be nice to provide a way to mark the server as truly optional so that these messages can be avoided.

I set fallback = true like this:


  # Setup the SSH keys for the machines we want to build against.
  programs.ssh = {
      extraConfig = ''
        Host missing-server

            # <snip>

            # Use an aggressive timeout because we're not always on
            # the LAN
            ConnectTimeout 3
    '';
  };

  nix = {
    settings = {

      # <snip>

      # Private binary cache
      substituters = [ "ssh-ng://missing-server" ];
    };

    extraOptions = ''
    # Ensure we can still build when missing-server is not accessible
    fallback = true
    '';
  }