fmalmeida / MpGAP

Multi-platform genome assembly pipeline for Illumina, Nanopore and PacBio reads
https://mpgap.readthedocs.io/en/latest/
GNU General Public License v3.0
53 stars 10 forks source link

Add a simple parameter to handle starting memory settings #61

Closed fmalmeida closed 4 months ago

fmalmeida commented 5 months ago

This issue relates to issues #52 and #59 where users seemed to face memory errors and had to adapt the config so that they could use more memory from the first try, instead of having to wait for retries.

By default, the pipeline first tries with a small amount, then it uses the fully amount specified by the max parameter:

// Assemblies will first try to adjust themselves to a parallel execution
    // If it is not possible, then it waits to use all the resources allowed
    withLabel:process_assembly {
      cpus   = {  if (task.attempt == 1) { check_max( 6 * task.attempt, 'cpus'       ) } else { params.max_cpus   } }
      memory = {  if (task.attempt == 1) { check_max( 20.GB * task.attempt, 'memory' ) } else { params.max_memory } }
      time   = {  if (task.attempt == 1) { check_max( 24.h * task.attempt, 'time'    ) } else { params.max_time   } }

      // retry at least once to try it with full resources
      errorStrategy = { task.exitStatus in [1,21,143,137,104,134,139,247] ? 'retry' : 'finish' }
      maxRetries    = 1
      maxErrors     = '-1'
    }

    // Quast sometimes can take too long
    withName:quast {
      cpus   = {  if (task.attempt == 1) { check_max( 4 * task.attempt, 'cpus'       ) } else { params.max_cpus   } }
      memory = {  if (task.attempt == 1) { check_max( 10.GB * task.attempt, 'memory' ) } else { params.max_memory } }
      time   = {  if (task.attempt == 1) { check_max( 12.h * task.attempt, 'time'    ) } else { params.max_time   } }

      // retry at least once to try it with full resources
      errorStrategy = { task.exitStatus in [21,143,137,104,134,139,247] ? 'retry' : 'finish' }
      maxRetries    = 1
      maxErrors     = '-1'
    }

Probably would be good to also define a parameter, to configure the starting memory amount&threads, which would be used in the first attempt of these modules.

Maybe, --start_asm_mem & --start_asm_cpus.

fmalmeida commented 4 months ago

Solved by #64