Schork-Lab / cg-classifier

Complete Genomics Classifier
0 stars 0 forks source link

Figure 1 #12

Open erscott opened 10 years ago

erscott commented 10 years ago

Figure 1: Training & Testing Data A.  Pipeline Schematic for identifying truth variant set B. Comparison of CG with Illumina Phased Pedigree results for NA12877 vs GIAB NA12877 just use tp and fp vcf files to generate vcfstats number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats

C. Comparison of CG with Illumina Phased Pedigree results for NA12878 vs GIAB NA12878 number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats

kunalbhutani commented 10 years ago

I can do B, C

erscott commented 10 years ago

Awesome. The comparison for GIAB requires a bedtools intersection. And you can't use the primitive's file

erscott commented 10 years ago

Figure 1: Training & Testing Data A. Pipeline Schematic for identifying truth variant set - https://www.dropbox.com/sh/s3fk4r1ghdbry6y/AAAH5kSJPHklI11XOu6enoNXa

B. Comparison of CG with NIST/GIAB 12878 - WE ARE USING HER FOR TRAINING NOW

EXCLUDES HALF-CALLS just use tp and fp vcf files to generate vcfstats number of indels average read depth Indel length histogram

C. - PROBABLY MOVE THIS TO A SEPARATE FIGURE Half-call converted into REF or converted into ALT (when alt present) number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats

erscott commented 10 years ago

Make a schematic of the impute_ref and impute_alt strategy

def imputed_ref_GT(half_GT): ''' This function imputes a reference call at the missing half-call site ''' half_GT = half_GT.split(":") half_GT[0] = half_GT[0].replace(".", "0") return ":".join(half_GT)

def imputed_alt_GT(half_GT): ''' This function imputes a reference call at the missing half-call site ''' half_GT = half_GT.split(":") if "0" in half_GT[0]: half_GT[0] = half_GT[0].replace(".", "0") return ":".join(half_GT) half_GT[0] = half_GT[0].replace(".", "1") return ":".join(half_GT)