StaPH-B / titan

Public health bioinformatics repository for WDL workflows to analyze viral genomes.
GNU Affero General Public License v3.0
3 stars 1 forks source link

Titan PE Cromwell CLI workflow #2

Closed k-florek closed 2 years ago

k-florek commented 3 years ago

Currently, the Titan PE workflow is designed for integration into the Terra.bio platform. For usage in the StaPH-B toolkit a new workflow must be created referencing the current Titan PE workflow and wrapping the inputs and outputs in a way that enables usage by the CLI via cromwell. This new workflow must be designed for compatibility with various environments including multiple HPC/Cloud platforms to match the StaPH-B Toolkit design philosophy.

The workflow must operate from the java -jar cromwell-<versionNumber>.jar run command.

k-florek commented 3 years ago

Update:

I was able to create a wrapper workflow around Titan that looks something like this:

version 1.0

import "wf_titan_illumina_pe.wdl" as titan_illumina_pe

struct InputJSON {
  File read1_raw
  File read2_raw
  String samplename
  File primer_bed
}

workflow cli_wrapper {
  input {
    Array[InputJSON] inputSamples
  }

  scatter (sample in inputSamples){
    call titan_illumina_pe.titan_illumina_pe{
      input:
        samplename = sample.samplename,
        seq_method = "Illumina paired-end",
        read1_raw = sample.read1_raw,
        read2_raw = sample.read2_raw,
        primer_bed = sample.primer_bed,
        pangolin_docker_image = "staphb/pangolin:2.3.2-pangolearn-2021-02-21"
    }
  }

}

Thanks Danny Park for showing me that structs exist!! This approach seems to work well and I've successfully run the Titan PE workflow locally using both cromwell and miniwdl.

Going forward I need to decide between cromwell and miniwdl. Cromwell offers flexibility in backend, while miniwdl makes the user experience a bit better.

The next step will be developing a method to capture the output into folders and a summary csv file.

jvhagey commented 3 years ago

Hi all, I currently have a version 1.0 of a wdl tutorial on my person github. Hopefully, this is helpful and let me know if there are issues that come up. It doesn't not yet explain how to use containers with cromwell, but that will come soon.

k-florek commented 3 years ago

I've got Titan PE fully integrated into the staphb_toolkit along with some code to handle downloading cromwell at runtime. Thanks to @jvhagey it works for both docker and singularity (and more if you have a configuration file). I still need to work how the output files are collected as it is still a bit of a mess. Once progress is made on this repository I can merge it into the toolkit and pull my toolkit updates into the toolkit repo.

kevinlibuit commented 3 years ago

Pangolin v3 now fully integrated into all of our Titan workflows for genomic characterization: https://github.com/theiagen/public_health_viral_genomics/releases/tag/v1.4.4