Open samuelklee opened 6 years ago
The Talkowski lab also runs cn.mops and CNVnator for read depth calling. No idea how easy those are to run or if they run on exomes.
Here is a recap of what we discussed today during the CNV meeting:
For the first round of evaluations we decided to run Germline CNV pipeline on TCGA exomes using a range of key hyperparameters (namely psi-t-scale and p-alt) and establish the base level performance metrics using output of GenomeSTRiP on matched WGS samples as ground truth.
@mbabadi could you come up with a good range of hyperparameter values that you think should be cross-validated?
In particular we need to:
For the next round of evaluations we need to:
Don't forget cn.mops (which will also be included in the somatic evaluations) and ModelSegments. We can also throw CNVnator in the mix. We'll run each tool on WES, WGS, or both as appropriate.
Also, let's use WDLs and Dockers from other groups, where available. In particular, I would hope that these are available for XHMM, GenomeSTRiP, and the tools that the Talkowski lab runs. We can discuss with @cwhelan today, but @asmirnov239 and @ldgauthier could you do some digging to see what the MacArthur lab has?
I'm pretty sure the MacArthur lab doesn't have any WDLs or Dockers. Somewhere I saw some shell scripts Menachem used when he ran xHMM on ExAC, but they're probably so old they're for LSF.
@asmirnov239 and Jack Fu are currently developing tests using Talkowski-SV truth that will ultimately cover #5633. Should be adapted to fit into whatever framework arises from #4630.
@asmirnov239 it might be worth summarizing the current status, for future reference. TODOs might also be useful.
The following work has been done:
psi_t
parameter.A few issues were encountered along the way:
Currently the ongoing work is focused on the following:
The following items are necessary done for automatic evaluation:
@mbabadi has some python scripts that we can expand upon. To start, this should include:
See also #2881.