cambiotraining / bioinformatics-software-pipelines

Course materials for "Managing Bioinformatics Software and Pipelines"
https://cambiotraining.github.io/bioinformatics-software-pipelines/
Other
0 stars 3 forks source link

Cromwell workflow engine #2

Open tavareshugo opened 1 year ago

tavareshugo commented 1 year ago

The Broad Institute workflow engine cromwell can be useful to running GATK workflows (written in WDL language).

Resources for this:

Some attempt to run this on CSD3. I have set the following config file:

backend {
  default = SLURM

  providers {
    SLURM {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        runtime-attributes = """
        Int runtime_minutes = 60
        Int cpus = 8
        Int requested_memory_mb_per_core = 3000
        String queue = "icelake,cclake,cclake-himem,icelake-himem"
        String account = "NAME-SL3-CPU"
        """

        submit = """
            sbatch -J ${job_name} -D ${cwd} -o ${out} -e ${err} -t ${runtime_minutes} -p ${queue} \
            ${"-c " + cpus} \
            --mem-per-cpu ${requested_memory_mb_per_core} \
            --wrap "/bin/bash ${script}"
        """
        kill = "scancel ${job_id}"
        check-alive = "squeue -j ${job_id}"
        job-id-regex = "Submitted batch job (\\d+).*"
      }
    }
  }
}

However, this would always submit the scripts with a fixed number of CPUs and time. Not sure if this can be changed on a step-by-step basis, based on the workflow somehow (like with Nextflow).

The config file can be used with the pipeline by running it as:

 cromwell -Dconfig.file=csd3.conf run helloworld.wdl