brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
248 stars 23 forks source link

make-gnotate #62

Closed josephhalstead closed 4 years ago

josephhalstead commented 4 years ago

Hi,

Thanks for this software. I've been experimenting with the Slivar gnotate function to speed up annotation. I've been putting the latest SpliceAI (1.3) VCF files into the gnotate zip format and came across the following output for each chromosome when doing so:

[slivar] writing 626834914 encoded and 0 long values for chromosome 10 [slivar] removed 12271564 duplicated positions by using the value and chromosome: 10

Just wondering what this means - is it removing certain variants from the annotation file? I am pretty sure it does not have duplicates.

brentp commented 4 years ago

Hi, spliceAI does have duplicates. There are some parts of the chromosome that overlap in the file. I was suprised too, but noted the same thing for spliceAI and tracked down entire regions that are duplicated.