brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

Fix always falling back to non-bgzip reader && allow .bgz extension #109

Closed zamaudio closed 5 years ago

zamaudio commented 5 years ago

Previously, files with .gz extension were assumed to be bgzip files. Now, .gz is assumed gzip and .bgz is assumed bgzip.

I think it makes more sense this way, and hope the author agrees.

EDIT: This is not an adequate assumption, but I discovered that bgzip reader was never being used and is fixed below.

brentp commented 5 years ago

I think all that's required to get the desired effect is to change the single line:

    if len(config.Annotation) < runtime.GOMAXPROCS(0) && strings.HasSuffix(queryFile, ".gz") {  

to

    if len(config.Annotation) < runtime.GOMAXPROCS(0) && (strings.HasSuffix(queryFile, ".gz") 
      || strings.HasSuffix(queryFile, ".bgz")) {    

this will make it so the query file can be .bgz, and bix has been updated to support ".bgz" extension.

zamaudio commented 5 years ago

Hi @brentp, the line you posted above has slightly different behaviour than my PR, it will try to open both .gz and .bgz extensions as block gzipped input. But my PR allows regular gzip to be opened as well. With the current code, if you try to open a regular gzipped vcf file that has not been bgzipped bgzip file with a .gz extension, sometimes vcfanno crashes. ~~Ideally vcfanno would detect which kind of compression it is, but my go skills are not quite up to doing that. Do you know of a way to detect if a file with .gz extension is block gzipped or regular gzipped that is less naive than reading the file extension? I think that would solve the issue more robustly and not change existing behaviour of your program.~~

brentp commented 5 years ago

the current code allows regular gzip to be opened. (that's why it sets qrdr = nil when there's an error with bgzf reader). can you write a test that fails with master and passes with your code so I can understand what you are adding?

I still think that all that's needed is the line I posted above. If I misunderstand, then a test will help to clarify.

zamaudio commented 5 years ago

@brentp I have identified that the bgzip reader is never being used in the current code. I have provided a brand new fix above that is basically your suggested change plus a couple of else clauses.

brentp commented 5 years ago

great! thank you for noticing and fixing this. I'll make a new release soon.