fulcrumgenomics / dagr

A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic GRaphs
MIT License
69 stars 14 forks source link

New Tasks #354

Open averagehat opened 5 years ago

averagehat commented 5 years ago

I'm working on a pipeline that uses the following tools which I have not been able to find dagr Task definitions for as of yet. I'm posting this to make sure I don't duplicate any work and that I contribute intelligently.

0. Fastq filtering based on quality, `N` content, and index quality (custom python code)
1. trimmomatic
2. BAM file read tagging 
3. fqstats
4. LoFreq / custom base-caller
5. Plots
  1. Is there some existing support for this?
  2. we use the quality based trimming, which isn't in TrimFastq. I would add a wrapper for trimmomatic.
  3. read tags are required by a specific tool that escapes me. Is there a task that tags bams already?
  4. I'd create a simple wrapper for fqstats.
  5. I'd create a simple wrapper for lofreq
  6. These are quality and genome coverage plots, as well as some other simple graphics. These are currently done in matplotlib, and I am planning a re-write in python or adapt some existing tool.
nh13 commented 5 years ago

0-2. check the list of tools in fgbio to see if there are any tools that can do this for you. For a custom Scala implementation, you could take a look at using TrimmingUtil in htsjdk. You can then add your various other criteria to a custom tool, either by writing your own fgbio-like tool, or by writing a little ammonite scala script. If you have a few new tasks you'd like to contribute (ex. wrappers around trimmomatic), I'd certainly welcome a PR. Ditto for tasks/wrappers around other commonly used tools. If you have custom tools or not very widely used tools, I'd recommend creating your own custom tasks in your own project and use them there.