NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.95k stars 1.53k forks source link

remote build of 'silent' packages fails #4427

Open ikervagyok opened 3 years ago

ikervagyok commented 3 years ago

Describe the bug Building of some packages on remote hosts fails. The affected packages produce no terminal output for long periods of time and thus the SSH connection gets closed for inactivity.

To Reproduce Steps to reproduce the behavior:

  1. setup remote building: https://nixos.wiki/wiki/Distributed_build
  2. build qtwebengine with -j0, to force remote build
  3. if your server doesn't produce warnings fast enough, you'll get this error on the server:

    Jan 05 13:33:52 SERVER systemd-logind[785]: Session 17 logged out. Waiting for processes to exit.
    Jan 05 13:33:52 SERVER systemd-logind[785]: Removed session 17.
    Jan 05 13:33:55 SERVER nix-daemon[2645]: unexpected Nix daemon error: writing to file: Broken pipe

    And on the client it will fail after its own timeout period.

    ...
    ../../3rdparty/chromium/services/network/trust_tokens/trust_token_request_redemption_helper.cc:59:31: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
       59 |   DCHECK(request->initiator() &&
          |          ~~~~~~~~~~~~~~~~~~~~~^~
       60 |              request->initiator()->scheme() == url::kHttpsScheme ||
          |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ../../3rdparty/chromium/base/logging.h:808:54: note: in definition of macro 'DCHECK'
      808 | #define DCHECK(condition) EAT_STREAM_PARAMETERS << !(condition)
          |                                                      ^~~~~~~~~
    
    client_loop: send disconnect: Broken pipe
    error: unexpected end-of-file
    builder for '/nix/store/9qskm7w05npz9vsh4r65dsjk11yvwi8m-qtwebengine-5.15.2.drv' failed with exit code 1
    cannot build derivation '/nix/store/xkr3cs62lf4lbi9bdswl7nsvbjcfwcv6-zoom-us-5.4.53350.1027.drv': 1 dependencies couldn't be built
    ...

Expected behavior No manual workarounds on SSH configs for remote building. nixos-rebuild -j 0 should always work, as long as there is a network connection. Maybe nix could send some sort of heartbeat packets over the same connection?

# nix-env --version
nix-env (Nix) 2.3.10
# nixos-version
21.03.git.014440d7105 (Okapi)
edolstra commented 3 years ago

SSH can already do this, see the ServerAliveInterval and TCPKeepAlive options in ssh_config.

ikervagyok commented 3 years ago

I know remote builds are kinda high-level, but it still is bad UX. I love the deterministic approach nix's ecosystem takes, and this doesn't feel right, since the only exhausted resource is a ssh/tcp heartbeat.

If you think everybody should solve this on his own, feel free to close this ticket.

p.s.: since it's my first interaction with @edolstra: Thanks for (starting) nix and the ecosystem around it!

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

magnetophon commented 1 year ago

Stil relevant to me.

roberth commented 3 months ago

I think we could just pass something like -o ServerAliveInterval=25 to the client process in ssh.c. That way will override user configuration, but it's a fairly low value, so I don't think that will be a problem. I don't think it needs to be higher because I agree with @ikervagyok that this is cheap. Especially compared to, like, building, or even the I/O and IPC we normally have for actual log lines that tend to be far more frequent than that.

cc @rickynils?

nixos-discourse commented 3 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-28-nix-team-meeting-minutes-173/51302/1