Illumina / hap.py

Haplotype VCF comparison tools
Other
414 stars 124 forks source link

hap.py comparison with query SNP.vcf #115

Open junyanzho opened 4 years ago

junyanzho commented 4 years ago

Dear developer, I compared trueset file NA12877.vcf.gz( including SNP and IDNEL variants) with query file(only SNP variants). For the INDEL in summary.csv, there are numbers on TRUTH.TP TRUTH.FN columns, while QUERY is zero. summary table as below:

Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt
INDEL ALL 523711 2648 521063 0 0 0 0
INDEL PASS 523711 2646 521065 0 0 0 0
SNP ALL 3519056 3484008 35048 3815273 3372 326544 2394
SNP PASS 3519056 3479369 39687 3781822 3005 298100 2199
Lenbok commented 4 years ago

I guess your query set has no indels in it, which is why those columns are empty. As for why the INDEL TRUTH.TP is greater than zero even though your query set does not contain indels, see https://cdn.rawgit.com/RealTimeGenomics/rtg-core/master/installer/resources/core/RTGOperationsManual/rtg_command_reference.html#benchmarking-performance-for-snps-versus-indels