Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
23 stars 20 forks source link

Import Error for bcftools Compressed Files #32

Open DarioS opened 5 years ago

DarioS commented 5 years ago

VCF files that are filtered using bcftools and have their data compressed by specifying -O z as part of the command can't be imported into R because of the error "scanBcfHeader(bf) : [internal] _hts_rewind() failed". Decompressing the file and re-compressing it using the R function bgzip enables it to be successfully imported, but it seems an inefficient workaround and it's unclear why it's required. This issue has been reported for an older version of VariantAnnotation #22

LTLA commented 8 months ago

Just ran into this problem, which is quite inconvenient.

Some investigative work suggests that VariantAnnotation already has a lot of the machinery required to decompress and read Gzipped (but not BGZF-compressed) files, based on the VariantAnnotation:::.vcf_scan_connection function.

It seems that the real blocker is that every readVcf method eventually calls Rsamtools::scanBcfHeader, which refuses to play nice with Gzip-compressed VCFs. If a connection method was added for that in Rsamtools, then everything in VariantAnnotation might just work. At least we'd be a step closer.