Add coverage plot creation to assembly analysis

dnanexus-archive / viral-ngs

viral-ngs

6 stars 6 forks source link

Add coverage plot creation to assembly analysis #35

Closed tomkinsc closed 8 years ago

tomkinsc commented 8 years ago

It would be great if consensus-coverage plots were part of the assembly analysis output. For workflows using the viral-ngs v1.8.0 tarball, this should be a matter of adding a call to reports.py plot_coverage to the analysis applet to generate a coverage plot from the *.mapped.bam file. For plot_coverage, the plot format (pdf, png, svg, etc.) is inferred from the file extension of the plot output file (the second positional argument for the command), but it can be given explicitly via the --plotFormat parameter. I’d suggest generating ${name}.coverage_plot.pdf, with a few extra arguments to create a letter-page size plot: reports.py plot_coverage ${name}.mapped.bam ${name}.coverage_plot.pdf --plotFormat pdf --plotWidth 1100 --plotHeight 850 --plotDPI 100

yifei-men commented 8 years ago

@tomkinsc

Hey Chris,

I did an overhaul of our pipeline to the v1.10.1 release of upstream and moved our execution to use the easy-deploy script (although I think the particular deployment strategy may already been slightly outdated by now)... It's in the misnamed v1.8.0 changes branch

We noticed that some interim figure of merits that we QC on in our CI tests (subsampled_read_counts in trinity and alignment_base_count in the final analysis for e.g.) have drifted. Good thing though is that the final assembly of the Ebola CI sample stays the same.

Just wanted to check in and make sure that these drifts we're seeing are expected:

v1.10.1 workflow

v1.7.1 workflow

tomkinsc commented 8 years ago

Nice!

Glad to see the metrics caught that change, but in this case it is expected. We overhauled the step that prepares input for de novo assembly to subsample reads in bam space. Beyond being faster, it made it possible to also reconsider how we were treating extra singleton reads after de-duplicating reads and trimming adapters and low-quality bases. Looks like mean coverage depth increases in the v1.10.1, which is nice to see 👍. The metrics returned by the pre-Trinity subsampling are a little different, and they're described here. The read count parameter given to the Trinity now specifies the number of individual reads to use rather than pairs (though in reaching the threshold it includes paired reads first, and then fills in with singletons).

yifei-men commented 8 years ago

Cool 😄 😄 Thanks Chris for the explanation!

I'll go ahead and edit our expected metrics then!

yifei-men commented 8 years ago

Closed via #39