NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.27k stars 14.25k forks source link

Nextflow: When using docker and enabling tracing, processes fail within the docker containers due to paths to /nix/store/bash #350183

Open stianlagstad opened 1 month ago

stianlagstad commented 1 month ago

Describe the bug

I'm using nextflow, as packaged in https://github.com/NixOS/nixpkgs/pull/339197/, from the current unstable channel. This new packaging of nextflow has a patch to make sure the correct paths to bash is used depending on whether docker is enabled for the nextflow workflow run or not: https://github.com/NixOS/nixpkgs/blob/ccc0c2126893dd20963580b6478d1a10a4512185/pkgs/development/interpreters/nextflow/default.nix#L36-L45. I suspect that this patch does not do the correct thing if tracing is also enabled for the nextflow workflow run. When running with both docker and tracing enabled, I get this error:

 N E X T F L O W   ~  version 24.08.0-edge

Launching `main.nf` [awesome_cori] DSL2 - revision: 5ad587431e

executor >  local (1)
[56/9687f5] sayHello (1) [100%] 1 of 1, failed: 1 ✘
ERROR ~ Error executing process > 'sayHello (1)'

Caused by:
  Process `sayHello (1)` terminated with an error exit status (127)
...

Command error:
  .command.run: line 154: /nix/store/717iy55ncqs0wmhdkwc5fg2vci5wbmq8-bash-5.2p32/bin/bash: No such file or directory

Work dir:
  /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428
...

The problem is in the .command.run file, where not all occurrences of paths to bash is handled correctly by the postPath. When I enable tracing in addition to docker, then the command in .command.run changes from this:

    docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -e "NXF_DEBUG=${NXF_DEBUG:=0}" -v /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428:/home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428 -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID quay.io/nextflow/bash /bin/bash /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428/.command.run

to this:

    docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -e "NXF_DEBUG=${NXF_DEBUG:=0}" -v /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428:/home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428 -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID quay.io/nextflow/bash /bin/bash /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428/.command.run nxf_trace

The addition of nxf_trace at the end there takes the bash script into the function nxf_trace_linux (or nxf_trace_mac is on mac), where there's a hardcoded path to the nix store-provided bash:

nxf_trace_linux() {
... 
    /nix/store/717iy55ncqs0wmhdkwc5fg2vci5wbmq8-bash-5.2p32/bin/bash -ue /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428/.command.sh &
...

This nix store-provided bash path should not be there as I enabled docker, but it remains, and causes the process to fail.

An additional occurrence of the nix store-provided bash path is present in the shebang of the .command.sh file, although it makes no difference to the result of the workflow run.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Create a shell with nextflow as packaged in https://github.com/NixOS/nixpkgs/pull/339197
  2. Download the example workflow https://github.com/nextflow-io/hello, or use any other workflow and enable both docker and tracing.
  3. Run the workflow and observe that you get an error similar to the above.

Expected behavior

I expected the workflow run to succeed, and that the files .command.run and .command.sh should have no occurrences of paths to the nix store.

Screenshots

NA

Additional context

NA

Notify maintainers

@edmundmiller @Etjean @rollf

Metadata

[stian@stian-dell-nixos:~/devp/hello]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.56, NixOS, 24.05 (Uakari), 24.05.20241016.e001b23`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.8`
 - nixpkgs: `/nix/store/apchqa88gf3yp1fghqn9aqkg0c2fm962-source`
stianlagstad commented 1 month ago

Here's a change which will add two more test cases, leading to failures:

nixos/tests/nextflow.nix --- Nix
23     run-nextflow-pipeline = pkgs.writeShellApplication {                    23     run-nextflow-pipeline = pkgs.writeShellApplication {
24       name = "run-nextflow-pipeline";                                       24       name = "run-nextflow-pipeline";
25       runtimeInputs = [ pkgs.nextflow ];                                    25       runtimeInputs = [ pkgs.nextflow ];
26       text = ''                                                             26       text = ''
27         export NXF_OFFLINE=true                                             27         export NXF_OFFLINE=true
28         for b in false true; do                                             28         for b in false true; do
29           echo "docker.enabled = $b" > nextflow.config                      29           for c in false true; do
..                                                                             30             echo "docker.enabled = $b" > nextflow.config
30           cat nextflow.config                                               31             echo "trace.enabled = $c" > nextflow.config
..                                                                             32             cat nextflow.config
31           nextflow run -ansi-log false ${hello}                             33             nextflow run -ansi-log false ${hello}
..                                                                             34           done
32         done                                                                35         done
33       '';                                                                   36       '';
34     };                                                                      37     };
35   in                                                                        38   in
36   {                                                                         39   {
rollf commented 1 month ago

@stianlagstad Thank you for the detailed report. I vaguely remember that I had looked into the traced-scenario and that something was problematic but I must have lost attention. Sorry about that. I'll have a look into this.

stianlagstad commented 1 month ago

@stianlagstad Thank you for the detailed report. I vaguely remember that I had looked into the traced-scenario and that something was problematic but I must have lost attention. Sorry about that. I'll have a look into this.

Please don't apologize! Thank you again for the good work in https://github.com/NixOS/nixpkgs/pull/339197 :100:

I'm trying myself to iterate on a modified postPatch, but so far I have not been successful. I'm considering writing a PR for the nextflow repository instead, so that we can avoid the hardcoded /bin/bash strings there.

Update: I've created an issue for this on the nextflow side: https://github.com/nextflow-io/nextflow/issues/5420

rollf commented 1 month ago

Ah, yes, me, too :) I have something working already see here but I'm not totally satisfied. I will iterate and then open a PR upstream.

rollf commented 1 month ago

Okay, I updated my nextflow-patch and also created a nixpkgs branch that uses this new version and doesn't need to patch anything anymore. You should be able to run it like so

❯ nix run github:rollf/nixpkgs/nextflow-trace#nextflow -- -version

      N E X T F L O W
      version 24.09.2-edge build 5927
      created 14-10-2024 21:22 UTC (23:22 CEST)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

(Mind that it's a newer version of nextflow).

Would be great if you could try this out.

I have adapted the NixOS-tests based on your suggestions (see here) and they pass.

If we're both satisfied, I'll open a PR upstream as well as for nixpkgs. Let's see whether we can get that integrated into upstream.

stianlagstad commented 1 month ago

Thank you @rollf !

Do I understand this correctly?

If I understand that correctly then there are some objections to this (using /usr/bin/env) in the comments here:

I also pointed to those in my issue in the nextflow repo: https://github.com/nextflow-io/nextflow/issues/5420.

I can confirm however that your fix works for my use case.

rollf commented 4 weeks ago

Do I understand this correctly?

Yes.

If I understand that correctly then there are some objections to this (using /usr/bin/env) in the comments here:

...

I also pointed to those in my issue in the nextflow repo: nextflow-io/nextflow#5420.

Yes, I had read those except the issue you raised, so thanks for pointing them out again. I still think what I propose is reasonable because a) it does keep /bin/bash wherever possible (and certainly within container instances) and b) the patch set is really small.

I can confirm however that your fix works for my use case.

Okay, I'll open a PR upstream. If it does not get accepted, then we'll need to keep the patch within nixpkgs :shrug: .

rollf commented 4 weeks ago

Here is the upstream PR. We'll have to wait for CI, too.

rollf commented 3 weeks ago

@stianlagstad I have spent much more time now to improve both on the nextflow as well as on the Nixpkgs side and updated both branches / the upstream PR.