Illumina / PlatinumGenomes

The Platinum Genomes Truthset
https://illumina.github.io/PlatinumGenomes
84 stars 9 forks source link

NA12877 VCF file issue #8

Closed rakesh4osdd closed 6 years ago

rakesh4osdd commented 6 years ago

Hi, I am using NA12877 VCF file for my project and I found that this file contains a duplicate position entry for a chromosome position. As per my knowledge, There is no need of such duplicate entry in a VCF file. Can you please explain the reason for duplicate position in yours VCF? Also, why there is no chrY entry in VCF file for the male NA12877 individual?

The file and entry is given below:- /ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12877/NA12877.vcf.gz

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12877

chr12 32192430 . T TTAAA 0 PASS KM=11.9;KFP=0;KFF=0;MTD=isaac_strelka GT 0|1 chr12 32192430 . T TTAAA 0 PASS MTD=isaac_strelka GT 0|1

Thanks.

blmoore commented 6 years ago

Thanks for reporting the duplicate VCF record, looks like a bug (removable with bcftools norm -D).

The Platinum Genomes truthset does not cover chrY yet

rakesh4osdd commented 6 years ago

Thanks for the reply. There is also the same issue with NA12878.vcf file where duplicate entry exists. chr2 118112315 . AAAAT A 0 PASS KM=2.72;KFP=8;KFF=1;MTD=isaac_strelka GT 0|1 chr2 118112315 . AAAAT A 0 PASS MTD=isaac_strelka GT 0|1

Thanks.