Figure 3 - Githubissues

erscott commented 10 years ago

Figure 3: Imputation Filters compared A. Imputation Schematic using plink2/SHAPEIT then Joining; only use SNPs remove NA12878 pedigree variants from the imputation reference file B. Final results (filter + imputation) for NA12878 descriptive stats

erscott commented 10 years ago

Figure 3: Benchmarked results against NA19240 and NA12877

A. Comparison of CG with Golden Set (GenomeComb) results for NA19240 Comparison of CG with Illumina Phased Pedigree (Real Time Genomics) results for NA12877 number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats

B. Training results - out-of-bag scores; (feature importances, appendix table) Test results with NA12877 and NA19240 using three different features sets

C. Time to train model Size of model: uncompressed vs compressed complexity of models (20k variants, 100k variants, 1 million variants, 3.8 million variants)

erickramer commented 10 years ago

For C, we should also add time to predict with the models. That's probably more importance for the users.

On Thu, Jul 31, 2014 at 11:24 AM, Erick Scott notifications@github.com wrote:

Figure 3: Benchmarked results against NA19240 and NA12877

A. Comparison of CG with Golden Set (GenomeComb) results for NA19240 Comparison of CG with Illumina Phased Pedigree (Real Time Genomics) results for NA12877 number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats

B. Training results - out-of-bag scores; (feature importances, appendix table) Test results with NA12877 and NA19240 using three different features sets

C. Time to train model Size of model: uncompressed vs compressed complexity of models (20k variants, 100k variants, 1 million variants, 3.8 million variants)

Reply to this email directly or view it on GitHub https://github.com/Schork-Lab/cg-classifier/issues/14#issuecomment-50798520 .

E. Ransom Kramer Torkamani Lab The Scripps Research Institute

erscott commented 10 years ago

ok

On Jul 31, 2014, at 11:50 AM, Eric Kramer notifications@github.com wrote:

For C, we should also add time to predict with the models. That's probably more importance for the users.

On Thu, Jul 31, 2014 at 11:24 AM, Erick Scott notifications@github.com wrote:

Figure 3: Benchmarked results against NA19240 and NA12877

A. Comparison of CG with Golden Set (GenomeComb) results for NA19240 Comparison of CG with Illumina Phased Pedigree (Real Time Genomics) results for NA12877 number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats

B. Training results - out-of-bag scores; (feature importances, appendix table) Test results with NA12877 and NA19240 using three different features sets

C. Time to train model Size of model: uncompressed vs compressed complexity of models (20k variants, 100k variants, 1 million variants, 3.8 million variants)

Reply to this email directly or view it on GitHub https://github.com/Schork-Lab/cg-classifier/issues/14#issuecomment-50798520 .

E. Ransom Kramer Torkamani Lab The Scripps Research Institute — Reply to this email directly or view it on GitHub.

Schork-Lab / cg-classifier

Figure 3 #14