broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 588 forks source link

Build automatic evaluation of gCNV pipeline and establish best practices. #4123

Open samuelklee opened 6 years ago

samuelklee commented 6 years ago

@mbabadi has some python scripts that we can expand upon. To start, this should include:

See also #2881.

ldgauthier commented 6 years ago

The Talkowski lab also runs cn.mops and CNVnator for read depth calling. No idea how easy those are to run or if they run on exomes.

asmirnov239 commented 6 years ago

Here is a recap of what we discussed today during the CNV meeting:

For the first round of evaluations we decided to run Germline CNV pipeline on TCGA exomes using a range of key hyperparameters (namely psi-t-scale and p-alt) and establish the base level performance metrics using output of GenomeSTRiP on matched WGS samples as ground truth.

@mbabadi could you come up with a good range of hyperparameter values that you think should be cross-validated?

In particular we need to:

For the next round of evaluations we need to:

samuelklee commented 6 years ago

Don't forget cn.mops (which will also be included in the somatic evaluations) and ModelSegments. We can also throw CNVnator in the mix. We'll run each tool on WES, WGS, or both as appropriate.

samuelklee commented 6 years ago

Also, let's use WDLs and Dockers from other groups, where available. In particular, I would hope that these are available for XHMM, GenomeSTRiP, and the tools that the Talkowski lab runs. We can discuss with @cwhelan today, but @asmirnov239 and @ldgauthier could you do some digging to see what the MacArthur lab has?

ldgauthier commented 6 years ago

I'm pretty sure the MacArthur lab doesn't have any WDLs or Dockers. Somewhere I saw some shell scripts Menachem used when he ran xHMM on ExAC, but they're probably so old they're for LSF.

samuelklee commented 5 years ago

@asmirnov239 and Jack Fu are currently developing tests using Talkowski-SV truth that will ultimately cover #5633. Should be adapted to fit into whatever framework arises from #4630.

samuelklee commented 5 years ago

@asmirnov239 it might be worth summarizing the current status, for future reference. TODOs might also be useful.

asmirnov239 commented 5 years ago

The following work has been done:

A few issues were encountered along the way:

Currently the ongoing work is focused on the following:

The following items are necessary done for automatic evaluation: