Closed Codes1985 closed 1 year ago
I think this issue is due to there not being enough space in the tmp directory for IRMA. I'll have to see if there's a way to have IRMA write to a user specified temp dir.
Have you tried host subtraction with Kraken2 against a suitable host index (e.g. human) outputting unclassified reads? This might help get the input file sizes down and might work as a workaround to IRMA requiring a certain amount of space in /tmp
Interesting. I thought it might be a space issue based on the description of the error, but I dismissed it because I didn't think that could be a problem. That would explain why increasing the memory allocation didn't help. These particular samples are cultured, so I'd have to see if there is a suitable index to dehost the sample with first, but it would be nice to not have to. This was a bit of an edge case, so I wanted to see if there was a quick solution, or something I overlooked. We can certainly decrease the input filse size through other means to mitigate this particular errror. Thank you for your help!
After some troubleshooting, we were able to overcome this issue by adding the IRMA parameters ALLOW_TMP=1 and TMP=\$PWD to the irma.nf
module as follows (note: $PWD is escaped):
touch irma_config.sh
echo 'SINGLE_LOCAL_PROC=${task.cpus}' >> irma_config.sh
echo 'DOUBLE_LOCAL_PROC=${(task.cpus / 2).toInteger()}' >> irma_config.sh
echo 'ALLOW_TMP=1' >> irma_config.sh
echo 'TMP=\$PWD' >> irma_config.sh
if [ ${params.keep_ref_deletions} ]; then
echo 'DEL_TYPE="NNN"' >> irma_config.sh
echo 'ALIGN_PROG="BLAT"' >> irma_config.sh
fi
Thanks again!
Thanks @Codes1985 for finding a fix to the issue you were encountering! I think it'd be good to have IRMA's tmp dir be in the Nextflow process work dir by default to prevent strange issues like this in the future. It'd an also make it easier to debug issues with IRMA analysis. I'll make a PR to address this so I'll reopen this issue until a new release fixing this issue has been created.
Thanks @peterk87! Yeah, this seems like a relatively easy QOL fix that should increase general robustness. Like I said before, this was a bit of an edge case, but it's nice to know under similar circumstances we won't need to do a bunch of fanagling to the samples prior to processing them through the pipeline.
Is there an existing issue for this?
Description of the Bug/Issue
Hello,
On rare occasion we have sequencing runs comprising relatively few samples resulting in large input .fastq files (>2GB compressed) that evidently cause IRMA to fail. I have attempted to modify the "base.config" file to increase the "withLabel:process_high" parameter (which governs the IRMA module) to >32GB to mitigate this issue
e.g.,
However, the IRMA command executed doesn't appear to be influenced by changing parameters in the base.config:
e.g.,
Of course it is possible to down sample the reads, though it would be preferable if I didn't have to do that. I'm not sure if there are any other levers I can pull within the pipeline to overcome this issue. Any help would be appreciated.
Nextflow command-line
Error Message
Workflow Version
3.1.2; revision: 9473cbaed9
Nextflow Executor
slurm
Nextflow Version
22.10.1
Java Version
openjdk version "17.0.3-internal" 2022-04-19 OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src) OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)
Hardware
HPC Cluster
Operating System (OS)
Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core
Conda/Container Engine
Singularity
Additional context
No response