Illumina / hap.py

Haplotype VCF comparison tools
Other
402 stars 122 forks source link

WES TI TV ratio #167

Closed solivehong closed 9 months ago

solivehong commented 1 year ago

HI team

Background

comparing for germline calling

I used the giab hg001 exome sample, but they only have the HG19 version, I converted hg19to38

The TITV in gatk CollectVariantCallingMetrics is 2.972691

## METRICS CLASS        picard.vcf.CollectVariantCallingMetrics$VariantCallingSummaryMetrics
TOTAL_SNPS      NUM_IN_DB_SNP   NOVEL_SNPS      FILTERED_SNPS   PCT_DBSNP       DBSNP_TITV      NOVEL_TITV      TOTAL_INDELS    NOVEL_INDELS    FILTERED_INDELS PCT_DBSNP_INDELS        NUM_IN_DB_SNP_INDELS    DBSNP_INS_DEL_RATIO     NOVEL_INS_DEL_RATIO     TOTAL_MULTIALLELIC_SNPS NUM_IN_DB_SNP_MULTIALLELIC      TOTAL_COMPLEX_INDELS    NUM_IN_DB_SNP_COMPLEX_INDELS    SNP_REFERENCE_BIAS      NUM_SINGLETONS
22840   21675   1165    0       0.948993        **2.972691**        0.941667        584     168     0       0.712329        416     0.740586        0.714286        23      21      25      19      0.571143        15161

But my test in hap.py result of 2.27

  Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL         7136      4855      2281        12520       849       6812     70    204       0.680353          0.851261        0.544089         0.756272                     NaN                     NaN                   2.395189                   2.958848
 INDEL   PASS         7136      4855      2281        12520       849       6812     70    204       0.680353          0.851261        0.544089         0.756272                     NaN                     NaN                   2.395189                   2.958848
   SNP    ALL        48602     33059     15543        57830      1433      23337    119     86       0.680198          0.958455        0.403545         0.795702                2.410382                2.276673                   1.691596                   2.884075
   SNP   PASS        48602     33059     15543        57830      1433      23337    119     86       0.680198          0.958455        0.403545         0.795702                2.410382                **2.276673**                   1.691596                   2.884075

I have many experiments, e.g.

  1. hard filter-notwork
  2. bedtools intersection -running bed with -benchmark bed
  3. using only the variation in the bed region
  4. using giab hg001 exome hg19 version vcf converted hg38 the bed is oliverlap (bedtools intersection)

These methods are failing with a TITV ratio about 2.2
Need some help, soooooo thanks

solivehong commented 1 year ago

I answered my own question I searched for a new and suitable version of the sample HG001 agilent exome and used giab vcf used bedtools compare agilent bed and benchmark bed to take overlap The results are great and meet my expectations

solivehong commented 1 year ago

Do I have a problem doing this