genome / analysis-workflows

Open workflow definitions for genomic analysis from MGI at WUSM.
MIT License
102 stars 57 forks source link

Fda metrics split on sequence #1077

Closed johnegarza closed 1 year ago

johnegarza commented 2 years ago

This PR introduces a CWL scatter across individual sequence elements (a bam or pair of fastqs) of the input array for each of the 3 unaligned inputs (normal DNA, tumor DNA, tumor RNA) when calculating statistics to be reported to the FDA in clinical cases. Previously, some statistics were calculated using data from all sequence files for an input at once, which is less accurate.

gschang commented 2 years ago

Thank you for this work, John. These updates are about FDA QC metric reports, which John and we recently added to the immuno workflow. We discussed more consistent QC reporting with multiple instrument data, i.e. towards more generalized interface to include them in the report.

As John suggests, I plan to run a test to doublecheck the key changes in the final FDA QC metrics report.

gschang commented 2 years ago

I just want to update this ticket here and John, Jasreet, and I are working on this PR. John and we have discussed more upgrades for generalized workflow concept, and now in progress. I updates the scripts as Tom advised above and finished testing.

This immuno workflow is really a challenge--the most complex workflow producing tons of outputs in a integrative context. As a plan, we are releasing two immuno workflow versions (i.e. (1) with rnaseq and (2) without rnaseq input) once John and we work out this PR. For information, we have successfully tested the new immuno workflow version that doesn't require rnaseq inputs.

gschang commented 1 year ago

These new changes in the generate_fda_metrics.cwl subworkflow worked on my end. I doublechecked every upgrade/fixation.

We're going to release a new immuno workflow unless there are other comments from someone else. I have requested to John for new immuno processing profiles of two versions--(1) immuno.cwl (currently running) and (2) immuno.cwl without rnaseq input.