broadinstitute / gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
BSD 3-Clause "New" or "Revised" License
33 stars 20 forks source link

ACNV evaluation #245

Closed samuelklee closed 8 years ago

samuelklee commented 8 years ago

Iteration evaluation results and TODOs for alpha:

samuelklee commented 8 years ago

@davidbenjamin @LeeTL1220 feel free to edit and expand.

samuelklee commented 8 years ago

/dsde/working/slee/acs-eval/scripts/plotting.py might be useful for generating plots, at least until the R/java plotting is in master. be sure to use the Python-3.4 dotkit if you want to run it on the server.

EDIT: made plots for about ~40 samples in /dsde/working/slee/acs-eval/out_case_stad_pd250_acs/acs_plots if you guys want to take a look

samuelklee commented 8 years ago

ran STAD normal-normal sanity check in /dsde/working/slee/acs-eval/out_case_stad_pd250_acs_normal_normal

LeeTL1220 commented 8 years ago

Regarding Ignat et al's data: HAPSEG comparison is probably most valuable and easiest to defend.

LeeTL1220 commented 8 years ago

@samuelklee Looks like the purity series bam files were removed. Going to try to get those back.

samuelklee commented 8 years ago

PRAD concordance plots/histograms now in /dsde/working/slee/acnv-eval. ACNV shows better RMSE concordance with HAPSEG than ACS (suggesting that we are less susceptible to outliers/oversegmentation); ACS shows better MAE concordance than ACNV, but they are both < 0.01. I think a case can be made to segment only on SNPs and to ignore the noisy coverage input from CNV.

LeeTL1220 commented 8 years ago

Moved CN LOH evaluation into a separate issue.

samuelklee commented 8 years ago

experimented with merging CR segments that were called copy-neutral on PRAD. note that oversegmentation remains (both in largely neutral regions that remain broken up by short amps/deletions, as well as in amps and dels), but concordance with HAPSEG improves slightly for the most part. however, runtime does increase because of similar-segment MCMC iterations. i think that using sdundo could help here. going to experiment with LUAD now.

samuelklee commented 8 years ago

Did not get to CLL samples and left ABSOLUTE spot checking to CGA, but I think we are generally satisfied with ACNV performance. I will continue running some minor evaluations before alpha but will go ahead and close this issue.