Schork-Lab / cg-classifier

Complete Genomics Classifier
0 stars 0 forks source link

Figure 2 #13

Open erscott opened 10 years ago

erscott commented 10 years ago

Figure 2: Machine Learning Filters compared A. Pipeline Schematic for masterVar formatting and RF training- python/ipynb script Join coverage data to each variant (average for multi-base variants)

B. Training results - out-of-bag scores; (feature importances, appendix table) Test results with NA12878 using different features sets Time to train model Size of model: uncompressed vs compressed complexity of models (20k variants, 100k variants, 1 million variants, 3.8 million variants)

    Hyper-parameter GridSearch

Comparison of misclassified variants for each algorithm
    Venn Diagram fp variants remaining vs fp filtered
    Venn Diagram for tp variants remaining vs tp filtered

C. Comparison by GCAT

erscott commented 10 years ago

Figure 2: TRAINING DATA A. Pipeline Schematic for masterVar formatting and RF/GBC training- python/ipynb script

B. The three feature sets we will benchmark are: A) All features

B) Read Depth, Allele Depth, derivatives of read depth and genotype info  (variant type, zygosity, allelic imbalance LR)

C) GL, GQ, HQ scores from Complete Genomics