epi2me-labs / wf-somatic-variation

Other
10 stars 5 forks source link

Default memory and time allocations for processes is too low #7

Closed TBradley27 closed 10 months ago

TBradley27 commented 10 months ago

Operating System

CentOS 7

Other Linux

No response

Workflow Version

v0.3.0

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

srun bash /scratchb/jblab/ampliconseq/nextflow run epi2me-labs/wf-somatic-variation \ -resume \ -c my_config.cfg \ -profile 'singularity' \ --snv \ --sv \ --mod \ --sample_name 'XXXX' \ --ref 'hsa.GRCh37_g1kp2.fa' \ --bam_normal 'XXXXXXX.sorted.bam' \ --bam_tumor 'XXXXXXX.sorted.bam'

Workflow Execution - CLI Execution Profile

custom

What happened?

Hello,

I was trying to use this workflow with an adaptation of the singularity profile. I adapted the singularity profile like this so singularity containers could be deployed using SLURM job scheduling software:

    // using singularity instead of docker
    singularity {
        process.executor = 'slurm'
        executor {
            queueSize = 50
            pollInterval = 30.sec
            jobName = { "'$task.name'" }
        }
        singularity {
            enabled = true
            autoMounts = true
        }
    }

The problem that I encountered is that the default resource and memory allocations for each job was too low. By default each job was only allocated 1GB of memory and 1 hour of wall clock time. I have standard ~30X coverage for my normal sample and ~30X for my tumour sample. Some jobs such as somatic_sv:nanomonsv_get have a standard run time of over four hours. I also think it likely that the default allocation of 1GB produces memory bottle necks too. From my report, somatic_sv:nanomonsv_get used 25GB of virtual memory. snv:clairs_full_hap_filter also did not complete within one hour

I resolved this issue by editing the config file and editing the process section like so to increase memory and time allocations for each process:

process {
    memory = '80 GB'
    time = '24h'
    container = "ontresearch/wf-somatic-variation:${params.wf.container_sha}"
    withLabel:wf_somatic_snv {
        container = "ontresearch/wf-somatic-snp:${params.wf.container_snp_sha}"
    }
    withLabel:wf_somatic_methyl {
        container = "ontresearch/wf-somatic-methyl:${params.wf.container_mod_sha}"
    }
    withLabel:wf_somatic_sv {
        container = "ontresearch/wf-somatic-sv:${params.wf.container_sv_sha}"
    }
    shell = ['/bin/bash', '-euo', 'pipefail']
}

Relevant log output

Command exit status:
  140

Command output:
  (empty)

Command error:
  08/24/2023 22:51:38 - nanomonsv.run - INFO - Clustering rearrangement type supporting reads for putative SVs
  08/24/2023 22:52:04 - nanomonsv.run - INFO - Clustering insertion type supporting reads for putative SVs
  08/24/2023 23:13:01 - nanomonsv.run - INFO - Clustering deletion type supporting reads for putative SVs
  08/24/2023 23:44:14 - nanomonsv.run - INFO - Clustering single breakend type supporting reads for putative SVs
  08/24/2023 23:46:56 - nanomonsv.run - INFO - Gathering sequences of supporting reads

Application activity log entry

No response

RenzoTale88 commented 10 months ago

@TBradley27 we try not to set any hard resource requirements on the processes to allow flexibility. Despite this, most distributed system will have default resource limits that might be too low for the workflow to run successfully and over which we have no control. For these use cases, you will to discuss with your IT provider to help you define a configuration file that works fine for your environment. You might also find help by having a look at the custom configurations provided by nf-core, which might provide defaults that would work fine with your system.

We thank for the feedback, we will update the documentation to help users avoid coming across these issues in the future :)