NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.31k stars 1.49k forks source link

Darwin builds forking off processes never finish #8232

Open flokli opened 1 year ago

flokli commented 1 year ago

Describe the bug

I have some Nix build forking off a long-running process in the background. In that specific example, a postgresql database.

The build process starts the database, does some insertions and queries in it, then renders a document into $out (end of build script).

On Linux, after all steps in the build script have been finished, the build exits, and the output can be observed.

On Darwin, the build process gets stuck indefinitely.

A workaround is to manually register cleanup traps in bash.

Steps To Reproduce

Build the following on x86_64-linux vs aarch64-darwin:

let
  pkgs = import <nixpkgs> { };
in
pkgs.callPackage
  (
    with import <nixpkgs> { };
    { stdenv, ... }:

    stdenv.mkDerivation {
      pname = "foo";
      version = "1";
      buildCommand = ''
        sleep infinity &

        echo foo > $out
      '';
    }
  )
{ }

Expected behavior

I'd expect in both cases the build to end, and the sleep process to be killed at the end of the build script.

nix-env --version output

nix-env (Nix) 2.13.3

Priorities

Add :+1: to issues you find important.

fricklerhandwerk commented 1 year ago

Triaged in the Nix team meeting:

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-28-nix-team-meeting-minutes-50/27698/1

roberth commented 1 year ago

I've seen bulky output get truncated, which would be the opposite problem of this issue.

The process exit and pipe close are independent and important events. I think a good algorithm would be:

// pseudocode; haven't written much std::variant or concurrent code yet; apologies

struct LogCompleted { };

runBuilderProcess() {
  auto builderPid = forkBuilderChild();
  future<ExitCode> exitCode = forkThread(() -> waitpid(builderPid));
  future<LogCompleted> outputDone = forkThread(logShovelingThread);

  std::variant<ExitCode, LogCompleted> firstResult = awaitFirstOf(exitCode, outputDone);

  firstResult.visit {
    exitCode -> {
      try {
        outputDone.awaitWithTimeout(10s)
      }
      catch {
        throw/return/warn/whatever "Console wasn't closed within 10s after builder exited with status %i. Did some child process keep the output open?";
      }
    },
    logCompleted -> {
      try {
        exitCode.awaitWithTimeout(10s)
      }
      catch {
        warn/whatever "builder has not exited within 10s after closing its log output.";
        exitCode.await();
      }
    }
  };
  handle exitCode;
}
fricklerhandwerk commented 1 year ago

Triaged in Nix maintainers team meeting 2023-06-09 without conclusion.

Complete discussion - @edolstra: we used to wait for pipe exit - @roberth: it's the correct behavior but we should also set a timeout to terminate the parent after a couple of seconds - @regnat: should have the same behavior on Linux and Darwin - but not being coherent is not the end of the world - @regnat: wouldn't be surprised if existing bash with background processes would behave differently for different bash versions - @fricklerhandwerk: @roberth's suggestion seems sensible. we may want to have a dedicated Darwin maintainer on or close to the team though, to be able to go into some depth
nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-06-09-nix-team-meeting-minutes-61/29163/1