fritzsedlazeck / SURVIVOR_ant

A framework to annotate SVs with previous known SVs (vcf file) and or with genomic features (gff and or bed files)
MIT License
13 stars 2 forks source link

Unexpected behaviour with gzipped gff #7

Closed wdecoster closed 6 years ago

wdecoster commented 6 years ago

Hi Fritz,

I found another tool of you which looks useful... so here I am :)

When I annotate using a gzipped gff I get (quickly) the following output:

INITIALIZE VCF: 
Parse VCF: 0
Parse BED: 0
Parse GFF: 1
PRint: 
start printing
print entries 
Successfully finished
total entries: 651
overlapping vcf entries: 0
overlapping annotations: 649

But the resulting vcf only contains total_Annotations=0 for every line.

Using the same gff, but no longer compressed takes a bit longer and gives the following output:

INITIALIZE VCF: 
Parse VCF: 0
Parse BED: 0
Parse GFF: 1
PRint: 
start printing
print entries 
Successfully finished
total entries: 651
overlapping vcf entries: 0
overlapping annotations: 360

And this time the resulting vcf looks as expected, e.g.: ;total_Annotations=16;overlapped_Annotations=gene38311,rna81866,id909513,id909514,cds63851,cds63851,rna81870,

So this was surprising.

Am I right that from the gff the "ID" attribute is taken?

Cheers, Wouter

fritzsedlazeck commented 6 years ago

haha, so this method is not really matured. So yes it just reads non compressed files.

Then the gff/gtf ranges widely in formats on what name should be taken. You are right that the method tries to take only the name from the ID. I havent found a good workaround....

Thanks Fritz

wdecoster commented 6 years ago

Can you suggest alternatives for annotation? I can try VEP/Snpeff/Annovar but I'm not sure if those are optimal for SVs.

fritzsedlazeck commented 6 years ago

Check out vcfanno. That should work.

wdecoster commented 6 years ago

Thanks!