epi2me-labs / wf-somatic-variation

Other
10 stars 5 forks source link

Packages not found on Debian HPC #13

Closed umranyaman closed 2 months ago

umranyaman commented 7 months ago

Ask away!

Hi,

I am running the test on the Debian HPC cluster with singularity configuration. The workflow stops at mod:getVersions due to "Python not found" although I do. Not sure what might be the issue.

nextflow run epi2me-labs/wf-somatic-variation --snv --sv --mod --sample_name 'MYSAMPLE' --ref 'wf-somatic-variation-demo/GCA_000001405.15_GRCh38_no_alt_analysis_set_chr20.fna' --bed 'wf-somatic-variation-demo/demo.bed'  --bam_normal 'wf-somatic-variation-demo/demo_normal.bam' --bam_tumor 'wf-somatic-variation-demo/demo_tumor.bam'  --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v3.5.2' --normal_min_coverage 0 --tumor_min_coverage 0 --out_dir . -profile 'singularity'

The error I get:

ERROR ~ Error executing process > 'mod:getVersions'

Caused by:
  Process `mod:getVersions` terminated with an error exit status (127)

Command executed:

  python --version | tr -s ' ' ',' | tr '[:upper:]' '[:lower:]' > versions.txt
  modkit --version | tr -s ' ' ',' >> versions.txt
  bgzip --version | awk 'NR==1 {print $1","$3}' >> versions.txt

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: python: command not found

Work dir:
  /home/DervisSalih/wf-somatic-variation/work/06/4cfe85e0d8b9f5f61a3599b1d43438

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Full log is attached.

Thanks very much

Best Umran

nextflow_log.pdf

SamStudio8 commented 7 months ago

@umranyaman Can you try creating a file named "sing_nohome.config" with the following contents:

singularity.runOptions = "--no-home"

You can provide this file to nextflow run with -c, eg: nextflow run -c sing_nohome.config ....

umranyaman commented 7 months ago

@SamStudio8 I created the config file with nano, and I have run

nextflow run epi2me-labs/wf-somatic-variation --snv --sv --mod --sample_name 'MYSAMPLE' --ref 'wf-somatic-variation-demo/GCA_000001405.15_GRCh38_no_alt_analysis_set_chr20.fna' --bed 'wf-somatic-variation-demo/demo.bed' --bam_normal 'wf-somatic-variation-demo/demo_normal.bam' --bam_tumor 'wf-somatic-variation-demo/demo_tumor.bam' --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v3.5.2' --normal_min_coverage 0 --tumor_min_coverage 0 --out_dir . -profile 'singularity' -c 'sing_nohome.config'

I still get the same error,

ERROR ~ Error executing process > 'mod:getVersions'

Caused by:
  Process `mod:getVersions` terminated with an error exit status (127)

Command executed:

  python --version | tr -s ' ' ',' | tr '[:upper:]' '[:lower:]' > versions.txt
  modkit --version | tr -s ' ' ',' >> versions.txt
  bgzip --version | awk 'NR==1 {print $1","$3}' >> versions.txt

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: python: command not found

Work dir:
  /home/DervisSalih/wf-somatic-variation/work/cb/7a2a4b362b1ca19b14dae0392b0a29

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Full log is attached.

nextflow_log.pdf

@RenzoTale88 Below are the outputs,

(twopass) DSUmranYaman@drihpc4:~/DervisSalih/wf-somatic-variation$ nextflow info wf-somatic-variation
 project name: epi2me-labs/wf-somatic-variation
 repository  : https://github.com/epi2me-labs/wf-somatic-variation
 local path  : /home/DSUmranYaman/.nextflow/assets/epi2me-labs/wf-somatic-variation
 main script : main.nf
 description : Somatic structural variants, methylation, short variants and short tandem repeat workflow
 author      : Oxford Nanopore Technologies
 revisions   : 
 * master (default)
   prerelease
   v0.1.0 [t]
   v0.1.1 [t]
   v0.2.0 [t]
   v0.3.0 [t]
   v0.4.0 [t]
   v0.5.0 [t]
   v0.5.1 [t]
(twopass) DSUmranYaman@drihpc4:~/DervisSalih/wf-somatic-variation$ singularity exec docker://ontresearch/modkit:shaeedb131a939d3eea2f9bd4dbecec805c0fa20bdb modkit --version
INFO:    Using cached SIF image
mod_kit 0.1.13
(twopass) DSUmranYaman@drihpc4:~/DervisSalih/wf-somatic-variation$ 

Thanks so much Umran

RenzoTale88 commented 7 months ago

@umranyaman can you try setting a different SINGULARITY_TMPDIR and NXF_SINGULARITY_CACHEDIR to a different folder? You can either:

  1. Create a new directory in a folder where you have read/write permission and enough storage to save the images; for example
  2. export the two new variables

Assuming you have enough space in the directory where you are running the analyses, and you are in the right folder, you can do:

mkdir ${PWD}/.singularity
export SINGULARITY_TMPDIR=${PWD}/.singularity
export NXF_SINGULARITY_CACHEDIR=${PWD}/.singularity

Then try again running the workflow.

umranyaman commented 7 months ago

@RenzoTale88 I have done these, unfortunately I have encountered the same issue:

ERROR ~ Error executing process > 'mod:getVersions'

Caused by:
  Process `mod:getVersions` terminated with an error exit status (127)

Command executed:

  python --version | tr -s ' ' ',' | tr '[:upper:]' '[:lower:]' > versions.txt
  modkit --version | tr -s ' ' ',' >> versions.txt
  bgzip --version | awk 'NR==1 {print $1","$3}' >> versions.txt

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: python: command not found

Work dir:
  /home/DervisSalih/wf-somatic-variation/work/cb/7a2a4b362b1ca19b14dae0392b0a29

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

But, I was curious to change autoMounts to "false". I would like to get your opinion but it seems to be running the mod:getVersions, now it is giving R error:

    // using singularity instead of docker
    singularity {
        singularity {
            enabled = true
            autoMounts = false
        }
    }

Then I re-run the pipeline,


ERROR ~ Error executing process > 'mod:rVersions'

Caused by:
  Process `mod:rVersions` terminated with an error exit status (1)

Command executed:

  R --version | awk 'NR==1 {print "R,"$3}' >> versions.txt
  R -e "packageVersion('DSS')" | awk '$1=="[1]" {print "DSS,"$2}' >> versions.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: passwd file doesn't exist in container, not updating
  WARNING: group file doesn't exist in container, not updating
  .command.sh: line 2: R: command not found
  .command.sh: line 2: versions.txt: No such file or directory

Work dir:
  /home/DervisSalih/wf-somatic-variati
[nextflow_log.txt](https://github.com/epi2me-labs/wf-somatic-variation/files/13415654/nextflow_log.txt)
on/work/1d/afed93955619cfc9f4eb1c4fdb9336

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Not sure if helps, but R is installed via conda in an environment, where python is installed via miniconda. Singularity is via root (/usr/bin/singularity) and it is singularity-ce version 4.0.0-jammy.

Thanks so much, Umran

RenzoTale88 commented 7 months ago

Hi @umranyaman I wouldn't set autoMount=false as it should help taking advantage of the user bind control feature of singularity. R should come with the ontresearch/dss container, so whether it is installed in your system or not shouldn't be used by the workflow.

Do you have all the following variables set?

echo "SINGULARITY_CACHEDIR" $SINGULARITY_CACHEDIR
echo "SINGULARITY_TMPDIR" $SINGULARITY_TMPDIR
echo "SINGULARITY_LOCALCACHEDIR" $SINGULARITY_LOCALCACHEDIR
echo "SINGULARITY_PULLFOLDER" $SINGULARITY_PULLFOLDER
echo "SINGULARITY_BINDPATH" $SINGULARITY_BINDPATH
umranyaman commented 7 months ago

Hi @RenzoTale88 I have changed the autoMount=true again,

Except SINGULARITY_TMPDIR (I previously set SINGULARITY_TMPDIR as above), the rest were not set. I have done;

export SINGULARITY_LOCALCACHEDIR=${PWD}/.singularity export SINGULARITY_CACHEDIR=${PWD}/.singularity export SINGULARITY_PULLFOLDER=${PWD}/.singularity export SINGULARITY_BINDPATH=${PWD}/.singularity

Now I get;

SINGULARITY_CACHEDIR /home/DSUmranYaman/DervisSalih/wf-somatic-variation/.singularity
SINGULARITY_TMPDIR /home/DSUmranYaman/DervisSalih/wf-somatic-variation/.singularity
SINGULARITY_LOCALCACHEDIR /home/DSUmranYaman/DervisSalih/wf-somatic-variation/.singularity
SINGULARITY_PULLFOLDER /home/DSUmranYaman/DervisSalih/wf-somatic-variation/.singularity
SINGULARITY_BINDPATH /home/DSUmranYaman/DervisSalih/wf-somatic-variation/.singularity

I run the pipeline again

nextflow run epi2me-labs/wf-somatic-variation --snv --sv --mod --sample_name 'MYSAMPLE' --ref 'wf-somatic-variation-demo/GCA_000001405.15_GRCh38_no_alt_analysis_set_chr20.fna' --bed 'wf-somatic-variation-demo/demo.bed' --bam_normal 'wf-somatic-variation-demo/demo_normal.bam' --bam_tumor 'wf-somatic-variation-demo/demo_tumor.bam' --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v3.5.2' --normal_min_coverage 0 --tumor_min_coverage 0 --out_dir . -profile 'singularity'

I get the same error at the beginning;

ERROR ~ Error executing process > 'mod:getVersions'

Caused by:
  Process `mod:getVersions` terminated with an error exit status (127)

Command executed:

  python --version | tr -s ' ' ',' | tr '[:upper:]' '[:lower:]' > versions.txt
  modkit --version | tr -s ' ' ',' >> versions.txt
  bgzip --version | awk 'NR==1 {print $1","$3}' >> versions.txt

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: python: command not found

Work dir:
  /home/DervisSalih/wf-somatic-variation/work/8c/28ffbd7aa1df997689ec2463ba5b32

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
[nextflow_log.txt](https://github.com/epi2me-labs/wf-somatic-variation/files/13459270/nextflow_log.txt)

 -- Check '.nextflow.log' file for details

Full log attached.

nextflow_log.txt

Thanks so much Umran

RenzoTale88 commented 7 months ago

Hi @umranyaman apologies for the late reply. I'm not sure what is going on with the singularity image here. Perhaps @SamStudio8 has some more suggestions on how to make this work. In the meanwhile, you can try get in touch with your system manager, that we might be missing out?

SamStudio8 commented 6 months ago

@umranyaman Sorry for the delay and thanks for the full log. I missed this the first time around but the telling lines are here:

  launchDir          : /home/DervisSalih/wf-somatic-variation
  workDir            : /home/DervisSalih/wf-somatic-variation/work
  projectDir         : /home/DSUmranYaman/.nextflow/assets/epi2me-labs/wf-somatic-variation

Unfortunately for us, Nextflow will mount the longest common path to the container. As you are launching from /home/DervisSalih/... but the workflow is installed to /home/DSUmranYaman/..., Nextflow will mount your /home to the container, obscuring the /home contents of our container.

This issue will be fixed in the new year as we will start to ship containers that install the dependencies somewhere more sensible than /home/epi2melabs, but in the meantime you should be able to get this working if you ensure that your launch directory, inputs and workflow assets are all installed to either a path in /home/DervisSalih/... or /home/DSUmranYaman/..., just not both.

For example you could try:

export NXF_HOME=/home/DervisSalih/.nextflow

which will install the Nextflow assets (and thus the workflow) to /home/DervisSalih/ rather than /home/DSUmranYaman/.

SamStudio8 commented 6 months ago

See https://github.com/epi2me-labs/wf-human-variation/issues/108 for a little more information.

umranyaman commented 6 months ago

@SamStudio8 Thank you! Yes, the test has run now.

I actually started to run the real sample;

nextflow run epi2me-labs/wf-somatic-variation --snv --sv --mod --sample_name 'MYrealsample' --ref 'nano-methyl/BAM/merged/GRCm39.genome.fa' --bam_normal 'nano-methyl/BAM/merged/barcode02_merged/aligned_sorted_barcode02.bam' --bam_tumor 'nano-methyl/BAM/merged/barcode01_merged/aligned_sorted_barcode01.bam' --basecaller_cfg 'dna_r9.4.1_450bps_hac' --normal_min_coverage 0 --tumor_min_coverage 0 --out_dir . -profile 'singularity' --annotation false --classify_insert false --t 50

It stopped at 'snv:pileup_variants':

[5c/bd9b72] NOTE: Process `snv:pileup_variants (129)` terminated with an error exit status (255) -- Execution is retried (1)
[ac/f324f2] NOTE: Process `snv:pileup_variants (131)` terminated with an error exit status (255) -- Execution is retried (1)
[5b/6d1107] NOTE: Process `snv:pileup_variants (133)` terminated with an error exit status (255) -- Execution is retried (1)
[ff/8cece6] NOTE: Process `snv:pileup_variants (128)` terminated with an error exit status (255) -- Execution is retried (1)
[75/63c73a] NOTE: Process `snv:pileup_variants (127)` terminated with an error exit status (255) -- Execution is retried (1)
[92/1eebc4] NOTE: Process `snv:pileup_variants (126)` terminated with an error exit status (255) -- Execution is retried (1)
[01/22dbe1] NOTE: Process `snv:pileup_variants (504)` terminated with an error exit status (255) -- Execution is retried (1)
[98/e260e9] NOTE: Process `snv:pileup_variants (125)` terminated with an error exit status (255) -- Execution is retried (1)
[cd/0b9ac4] NOTE: Process `snv:pileup_variants (123)` terminated with an error exit status (255) -- Execution is retried (1)
[7c/1f3908] NOTE: Process `snv:pileup_variants (124)` terminated with an error exit status (255) -- Execution is retried (1)
[5e/092c94] NOTE: Process `snv:pileup_variants (122)` terminated with an error exit status (255) -- Execution is retried (1)
[06/120d2d] NOTE: Process `snv:pileup_variants (104)` terminated with an error exit status (255) -- Execution is retried (1)
[56/068551] NOTE: Process `snv:pileup_variants (115)` terminated with an error exit status (255) -- Execution is retried (1)
[81/8384ca] NOTE: Process `snv:pileup_variants (113)` terminated with an error exit status (255) -- Execution is retried (1)
[a6/76d912] NOTE: Process `snv:pileup_variants (119)` terminated with an error exit status (255) -- Execution is retried (1)
[78/fbdfd1] NOTE: Process `snv:pileup_variants (116)` terminated with an error exit status (255) -- Execution is retried (1)
[32/11683a] NOTE: Process `snv:pileup_variants (118)` terminated with an error exit status (255) -- Execution is retried (1)
ERROR ~ Error executing process > 'snv:pileup_variants (129)'

Caused by:
  Process `snv:pileup_variants (129)` terminated with an error exit status (255)

Command executed:

  python $(which clair3.py) CallVariantsFromCffi \
      --chkpnt_fn ${CLAIR_MODELS_PATH}/clair3_models/r941_prom_sup_g5014/pileup \
      --bam_fn aligned_sorted_barcode01.bam \
      --call_fn pileup_MYrealsample_tumor_chr4_20.vcf \
      --ref_fn GRCm39.genome.fa \
      --ctgName chr4 \
      --chunk_id 20 \
      --chunk_num 32 \
      --platform ont \
      --fast_mode False \
      --snp_min_af 0.08 \
      --indel_min_af 0.15 \
      --minMQ 5 \
      --minCoverage 4 \
      --call_snp_only False \
      --sampleName MYrealsample \
      --vcf_fn EMPTY \
      --enable_long_indel False \
      --bed_fn \
      --samtools samtools \
      --gvcf false \
      --temp_file_dir gvcf_tmp_path \
      --pileup \
      --cmd_fn CMD

Command exit status:
  255

Command output:
  (empty)

Command error:
  FATAL:   container creation failed: mount /proc/self/fd/4->/var/lib/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/4: failed to find loop device: could not attach image file to loop device: no loop devices available

Work dir:
  /home/DervisSalih/work/7e/02b6110799cea184ff1314a584c36b

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

I also checked the loop devices available via ls /dev/loop* and it seems fine.

Full log:

nextflow.txt

Thank you, Umran

RenzoTale88 commented 3 months ago

@umranyaman many apologies for the very delayed reply to this. Are you still experiencing this issue with the newer release of the workflow?

umranyaman commented 2 months ago

Hi @RenzoTale88, thanks for the follow up. I dont encounter the issues above once setting the nextflow path as @SamStudio8 suggested. Mod runs smoothly so far, sv and snv was taking too long time(or there might be problem), so I only run the mod for now. I will also test sv and snv soon. I can open this as a separate issue as well.

Thanks, Umran

RenzoTale88 commented 2 months ago

@umranyaman thanks for confirming this. Yes, if you come across further issues please do open a new ticket.