NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.31k stars 1.48k forks source link

Remote builds book-keeping appears to be sequential nix-wide. #3651

Open nagisa opened 4 years ago

nagisa commented 4 years ago

Describe the bug

I am using a remote builder set-up as such:

$ cat /etc/nix/machines
some-builder-hostname some-system /some-ssh-key 32 1 big-parallel

And nix is able to use them, as indicated in the nix-build log as such:

building '/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-some.drv' on 'ssh://some-builder-hostname'...
building '/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-some.drv' on 'ssh://some-builder-hostname'...
building '/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-some.drv' on 'ssh://some-builder-hostname'...
building '/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-some.drv' on 'ssh://some-builder-hostname'...
building '/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-some.drv' on 'ssh://some-builder-hostname'...
...

However, in my case creating an ssh session to some-builder-hostname is a slow operation (takes ~1 second), and the derivations I’m building are all very small/quick to build. The graph of derivations being build has large amounts of parallelism (as evidenced by nix easily building 30+ derivations in parallel for most of the build when building on the builder locally).

With remote builders, what I’m seeing is that nix will submit jobs to some-builder-hostname sequentially (each "building ... on ssh://..." message takes about a second to show up) and block any other operations while doing so. Some examples of such blocked operations are outputting build logs from already running derivations, reaping nix-remote-X processes or submitting other, independent, derivations to a remote builder.

Once in a blue moon nix will decide that it has submitted everything it can – at which point logs from said builds gets dumped into stdout and the nix-remote-X processes are reaped, all at once. This happens quite long after the remote builder is done building the some of the derivation(s).

This in practice means that for many small derivations that I have having a remote builder set-up is actively harmful to the build times.

Steps To Reproduce

  1. Try to build highly parallelizable graph on a remote builder to which connection takes a long time.

Expected behavior

I think it would be ideal if:

  1. Nix submitted jobs to remote builders in parallel;
  2. Submitting jobs didn’t block other operations that nix may want to do (such as showing live output of the build);

nix-env --version output

nix-env (Nix) 2.3.5

Additional context

N/A, but feel free to ask questions or get me to verify something.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info