Open erscott opened 10 years ago
I can do B, C
Awesome. The comparison for GIAB requires a bedtools intersection. And you can't use the primitive's file
Figure 1: Training & Testing Data A. Pipeline Schematic for identifying truth variant set - https://www.dropbox.com/sh/s3fk4r1ghdbry6y/AAAH5kSJPHklI11XOu6enoNXa
B. Comparison of CG with NIST/GIAB 12878 - WE ARE USING HER FOR TRAINING NOW
EXCLUDES HALF-CALLS just use tp and fp vcf files to generate vcfstats number of indels average read depth Indel length histogram
C. - PROBABLY MOVE THIS TO A SEPARATE FIGURE Half-call converted into REF or converted into ALT (when alt present) number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats
Make a schematic of the impute_ref and impute_alt strategy
def imputed_ref_GT(half_GT): ''' This function imputes a reference call at the missing half-call site ''' half_GT = half_GT.split(":") half_GT[0] = half_GT[0].replace(".", "0") return ":".join(half_GT)
def imputed_alt_GT(half_GT): ''' This function imputes a reference call at the missing half-call site ''' half_GT = half_GT.split(":") if "0" in half_GT[0]: half_GT[0] = half_GT[0].replace(".", "0") return ":".join(half_GT) half_GT[0] = half_GT[0].replace(".", "1") return ":".join(half_GT)
Figure 1: Training & Testing Data A. Pipeline Schematic for identifying truth variant set B. Comparison of CG with Illumina Phased Pedigree results for NA12877 vs GIAB NA12877 just use tp and fp vcf files to generate vcfstats number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats
C. Comparison of CG with Illumina Phased Pedigree results for NA12878 vs GIAB NA12878 number of snps, indels, mnps average read depth ts/tv whole genome vs exome basechange stats