epi2me-labs / wf-16s

Other
19 stars 3 forks source link

(Singularity) Container missing dependencies #21

Closed marchoeppner closed 2 months ago

marchoeppner commented 2 months ago

Operating System

Other Linux (please specify below)

Other Linux

AlmaLinux 9.4

Workflow Version

v1.2

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-16s --fastq wf-16s-demo/test_data/ -profile singularity

Workflow Execution - CLI Execution Profile

singularity

What happened?

The process fastcat fails due to missing dependencies in the container ontresearch-wf-metagenomics-sha44a6dacff5f2001d917b774647bb4cbc1b53bc76

Command error:
  .command.sh: line 6: fastcat: command not found
  .command.sh: line 7: bgzip: command not found

This is the second epi2me pipeline I am trying today and the second pipeline that fails with broken containers. Maybe a basic sanity check of the pipeline should be included in the pre-release github actions. And just from personal experience, it is generally advisable not to build overly bloated containers and instead go process-by-process (see nf-core). Makes everything signficantly cleaner.

Relevant log output

Pulling Singularity image docker://ontresearch/wf-metagenomics:sha44a6dacff5f2001d917b774647bb4cbc1b53bc76 [cache /work_syn/ngs/pipelines/foodme2/ont/epi2me/work/singularity/ontresearch-wf-metagenomics-sha44a6dacff5f2001d917b774647bb4cbc1b53bc76.img]
WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: /work_syn/ngs/pipelines/foodme2/ont/epi2me/work/singularity -- Use the environment variable NXF_SINGULARITY_CACHEDIR to specify a different location
ERROR ~ Error executing process > 'fastcat (2)'

Caused by:
  Process `fastcat (2)` terminated with an error exit status (127)

Command executed:

  mkdir fastcat_stats
  mkdir fastq_chunks

  # Save file as compressed fastq
  fastcat         -s barcode02         -f fastcat_stats/per-file-stats.tsv         -i fastcat_stats/per-file-runids.txt         --histograms histograms                  -a 800 -b 2000         input_src     | if [ "0" = "0" ]; then
      bgzip -@ 4 > fastq_chunks/seqs.fastq.gz
    else
      split -l null -d --additional-suffix=.fastq.gz --filter='bgzip -@ 4 > $FILE' - fastq_chunks/seqs_;
    fi

  mv histograms/* fastcat_stats

  # get n_seqs from per-file stats - need to sum them up
  awk 'NR==1{for (i=1; i<=NF; i++) {ix[$i] = i}} NR>1 {c+=$ix["n_seqs"]} END{print c}'         fastcat_stats/per-file-stats.tsv > fastcat_stats/n_seqs
  # get unique run IDs
  awk 'NR==1{for (i=1; i<=NF; i++) {ix[$i] = i}} NR>1 {print $ix["run_id"]}'         fastcat_stats/per-file-runids.txt | sort | uniq > fastcat_stats/run_ids

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 6: fastcat: command not found
  .command.sh: line 7: bgzip: command not found

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

no

Other demo data information

No response

SamStudio8 commented 2 months ago

Hi marchoeppner, Thanks for taking the time to open this ticket and share your logs with us. This is a known issue with Singularity when users are mounting more than one location under /home to one of our workflow's Singularity containers. This usually happens when a user has installed the workflow to /home/user/.nextflow and also wants to pass data from /home/another/path/to/some/data as a parameter to the workflow. Nextflow will bind the shortest common path to the container which causes the container's /home to be shadowed. Unfortunately, this is where all of our dependencies are stored (as our workflow's maintain their own user, with their own environment).

We are in the process of updating all our containers to move our dependencies out of /home/epi2melabs to somewhere less likely to be obscured by Nextflow or a user. In the meantime, you can avoid this temporarily by installing the workflow somewhere else (eg. not /home/user/.nextflow), and to ensure you are not passing multiple parameters that require different directories under /home to be mounted.

SamStudio8 commented 2 months ago

@marchoeppner Would you kindly share a full .nextflow.log of your workflow in this instance to confirm it matches our known issue? I can see that fastcat is present in our container.

$ singularity exec docker://ontresearch/wf-metagenomics:sha44a6dacff5f2001d917b774647bb4cbc1b53bc76 which fastcat
/home/epi2melabs/conda/bin/fastcat
marchoeppner commented 2 months ago

Right, so this is something else then. I am getting a bunch of errors like this:

2024/07/15 12:54:00  warn rootless{home/epi2melabs/conda/share/terminfo/z/ztx-1-a} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"

This is now with apptainer 1.3.3; before that I was running Singularity 4.1. I haven't seen this before, but also haven't tried pulling docker containers in a while. In any case, /home/epi2melabs isn't there after building the container. Which explains why the binaries can't be found, but not why this is happening ;)

SamStudio8 commented 2 months ago

@marchoeppner Those errors are a bit annoying but harmless, it's just informing you that it can't setxattr on files inside the container - I'll open a ticket to see if there is anything we need to do to clean those up. There is quite a bit of frustrating complexity with how Singularity/Apptainer treat $HOME that has bitten users of Nextflow. Some environments require --no-home to explicitly avoid binding the host home directory, are you able to find fastcat by trying:

singularity exec --no-home docker://ontresearch/wf-metagenomics:sha44a6dacff5f2001d917b774647bb4cbc1b53bc76 which fastcat
marchoeppner commented 2 months ago

So... yes, working now.

Basically, while --no-home was the right idea, we also had home bound permanently through our local apptainer config, which apparently still overwrites --no-home.

Removing that from the config finally enabled /home/epi2melabs to appear.

I suppose, as you mentioned, moving epi2melabs someplace else - maybe /opt - would avoid such problems.

Thanks for the help!

SamStudio8 commented 2 months ago

Great news. I had not realised that it was possible for a local configuration to override --no-home like that, can you provide some more information on what that configuration was?

I hope that is the last of your issues running our workflows but please don't hesitate to reach out to us again.

marchoeppner commented 2 months ago

Oh, merely a bind path = /home

along with some other static bind paths. Wasn't strictly necessary - so removing it didn't hurt.