CFIA-NCFAD / nf-flu

Influenza genome analysis Nextflow workflow
MIT License
14 stars 9 forks source link

[BUG]: IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk #13

Closed Codes1985 closed 1 year ago

Codes1985 commented 1 year ago

Is there an existing issue for this?

Description of the Bug/Issue

Hello,

On rare occasion we have sequencing runs comprising relatively few samples resulting in large input .fastq files (>2GB compressed) that evidently cause IRMA to fail. I have attempted to modify the "base.config" file to increase the "withLabel:process_high" parameter (which governs the IRMA module) to >32GB to mitigate this issue

e.g.,

process_high

However, the IRMA command executed doesn't appear to be influenced by changing parameters in the base.config:

e.g.,

irma_job

Of course it is possible to down sample the reads, though it would be preferable if I didn't have to do that. I'm not sure if there are any other levers I can pull within the pipeline to overcome this issue. Any help would be appreciated.

Nextflow command-line

sbatch -c 2 --mem=4GB -p OutbreakResponse --wrap="nextflow ${WORKFLOW_DIR} --input ${INPUT_SHEET} --platform ${PLATFORM} ${DATABASE} --outdir ${OUTDIR} -profile singularity,slurm -resume"

Note: platform = nanopore, and in this particular case, no user-defined database was used.

Error Message

Oops... Pipeline execution stopped with the following message: Loading config file 'irma_config.sh'
[2023-05-15 10:16:54]   IRMA/FLU-minion started run 'GEN23-0018-neat'
[2023-05-15 10:16:54]   IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk
[2023-05-15 10:16:54]   IRMA/FLU-minion ABORTED run: GEN23-0018-neat
[f7/e4e21a] NOTE: Process `NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)` terminated with an error exit status (1) -- Execution is retried (1)
[ed/a49ef6] NOTE: Process `NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)` terminated with an error exit status (1) -- Execution is retried (2)
Error executing process > 'NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)'

Caused by:
  Process `NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)` terminated with an error exit status (1)

Command executed:

  touch irma_config.sh
  echo 'SINGLE_LOCAL_PROC=16' >> irma_config.sh
  echo 'DOUBLE_LOCAL_PROC=8' >> irma_config.sh
  if [ true ]; then
    echo 'DEL_TYPE="NNN"' >> irma_config.sh
    echo 'ALIGN_PROG="BLAT"' >> irma_config.sh
  fi

  IRMA FLU-minion GEN23-0018-neat.merged.fastq.gz GEN23-0018-neat

  if [ -d "GEN23-0018-neat/amended_consensus/" ]; then
    cat GEN23-0018-neat/amended_consensus/*.fa > GEN23-0018-neat.irma.consensus.fasta
  fi
  ln -s .command.log GEN23-0018-neat.irma.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:IRMA":
     IRMA: $(IRMA | head -n1 | sed -E 's/^Iter.*IRMA\), v(\S+) .*/\1/')
  END_VERSIONS

Command exit status:
  1

Command output:
  Loading config file 'irma_config.sh'
  [2023-05-15 10:16:54] IRMA/FLU-minion started run 'GEN23-0018-neat'
  [2023-05-15 10:16:54] IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk
  [2023-05-15 10:16:54] IRMA/FLU-minion ABORTED run: GEN23-0018-neat

Command wrapper:
  Loading config file 'irma_config.sh'
  [2023-05-15 10:16:54] IRMA/FLU-minion started run 'GEN23-0018-neat'
  [2023-05-15 10:16:54] IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk
  [2023-05-15 10:16:54] IRMA/FLU-minion ABORTED run: GEN23-0018-neat

Work dir:
 /path/to/workdir/IRVC20230417IHN_analysis/20230417_samples_nf-flu_results/work/a5/67d3b6c885164cd99afcf1598e54ef

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Workflow Version

3.1.2; revision: 9473cbaed9

Nextflow Executor

slurm

Nextflow Version

22.10.1

Java Version

openjdk version "17.0.3-internal" 2022-04-19 OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src) OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)

Hardware

HPC Cluster

Operating System (OS)

Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core

Conda/Container Engine

Singularity

Additional context

No response

peterk87 commented 1 year ago

I think this issue is due to there not being enough space in the tmp directory for IRMA. I'll have to see if there's a way to have IRMA write to a user specified temp dir.

Have you tried host subtraction with Kraken2 against a suitable host index (e.g. human) outputting unclassified reads? This might help get the input file sizes down and might work as a workaround to IRMA requiring a certain amount of space in /tmp

Codes1985 commented 1 year ago

Interesting. I thought it might be a space issue based on the description of the error, but I dismissed it because I didn't think that could be a problem. That would explain why increasing the memory allocation didn't help. These particular samples are cultured, so I'd have to see if there is a suitable index to dehost the sample with first, but it would be nice to not have to. This was a bit of an edge case, so I wanted to see if there was a quick solution, or something I overlooked. We can certainly decrease the input filse size through other means to mitigate this particular errror. Thank you for your help!

Codes1985 commented 1 year ago

After some troubleshooting, we were able to overcome this issue by adding the IRMA parameters ALLOW_TMP=1 and TMP=\$PWD to the irma.nf module as follows (note: $PWD is escaped):

  touch irma_config.sh
  echo 'SINGLE_LOCAL_PROC=${task.cpus}' >> irma_config.sh
  echo 'DOUBLE_LOCAL_PROC=${(task.cpus / 2).toInteger()}' >> irma_config.sh
  echo 'ALLOW_TMP=1' >> irma_config.sh
  echo 'TMP=\$PWD' >> irma_config.sh
  if [ ${params.keep_ref_deletions} ]; then
    echo 'DEL_TYPE="NNN"' >> irma_config.sh
    echo 'ALIGN_PROG="BLAT"' >> irma_config.sh
  fi

Thanks again!

peterk87 commented 1 year ago

Thanks @Codes1985 for finding a fix to the issue you were encountering! I think it'd be good to have IRMA's tmp dir be in the Nextflow process work dir by default to prevent strange issues like this in the future. It'd an also make it easier to debug issues with IRMA analysis. I'll make a PR to address this so I'll reopen this issue until a new release fixing this issue has been created.

Codes1985 commented 1 year ago

Thanks @peterk87! Yeah, this seems like a relatively easy QOL fix that should increase general robustness. Like I said before, this was a bit of an edge case, but it's nice to know under similar circumstances we won't need to do a bunch of fanagling to the samples prior to processing them through the pipeline.

peterk87 commented 1 year ago

Incorporated IRMA tmp dir fix in #16 and release 3.1.5. Thanks @Codes1985 :+1:

https://github.com/CFIA-NCFAD/nf-flu/blob/f0eb199bd5d2ca0c0b8dc2786bbd39a7e44895cf/modules/local/irma.nf#L29-L32