Closed JohnHadish closed 2 months ago
Hey @JohnHadish can you provide more info. What version of nextflow, java, etc, what mode, what parameters, etc.
Alright, I think I may have fixed it. The issue appears to be that GEMmaker does not like how I specified memory for fastq_merge processes. The workflow was running as expected, but failed due to running out of memory. After this I changed the fastq_merge parameter in the nextflow.config
to permit more memory usage:
withName: fastq_merge {
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
}
My entire nextflow.config
then looked like this:
profiles {
kamiak {
process {
executor = "slurm"
errorStrategy = "retry"
maxRetries = 3
withName: retrieve_sra_metadata {
memory = 32.GB
}
withName: create_gem {
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName: multiqc {
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName: fastq_merge {
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
}
}
executor {
queueSize = 120
}
}
}
I am not sure why this would have produced the above error, as I have specified other memory requirments in the same way. I can not find anything in the workflow which would make fastq_merge fall under different specifications.
Upon removing the new memory requirments, it appears that everything is running.
nextflow/21.10.6 java/1.8.0 singularity/3.8.0
Replicate using the test data: Command:
projectDir="/home/john.hadish/.nextflow/assets/systemsgenetics/gemmaker"
nextflow run systemsgenetics/gemmaker -r gem_fix \
-profile kamiak,singularity \
-resume \
--pipeline kallisto \
--sras "${projectDir}/assets/demo/SRA_IDs.txt" \
--input "${projectDir}/assets/demo/*{1,2}.fastq" \
--skip_samples "${projectDir}/assets/demo/samples2skip.txt" \
--kallisto_index_path "${projectDir}/assets/demo/references/CORG.transcripts.Kallisto.indexed" \
--max_cpus 80
nextflow.config:
profiles {
kamiak {
process {
executor = "slurm"
queue = "ficklin"
clusterOptions = "--account=ficklin"
errorStrategy = "retry"
maxRetries = 5
withName: retrieve_sra_metadata {
memory = 32.GB
}
withName: create_gem {
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName: multiqc {
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName: fastq_merge {
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
}
}
executor {
queueSize = 90
}
}
}
The fastq_merge script has been replaced with a bash script that uses system commands to merge files. So there shouldn't be a memory issue anymore.
Description of the bug
Upon restart, GEMmaker will run for around 20 minutes then throw the following message. The main node which launches all jobs continues to run without launching any new jobs. The output log file immediatly ends without throwing an error or specifying a direcory where GEMmaker failed.
The error file looks like this:
Command used and terminal output
No response
Relevant files
No response
System information
No response