Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
23 stars 20 forks source link

API to change output suffix to .gz #35

Open d-cameron opened 4 years ago

d-cameron commented 4 years ago

The majority of VCF handling bioinformatics libraries use a .vcf.gz suffix, even for block gziped output. writeVcf() with index=TRUE does not support this and forceably sets the suffix to .bgz.

The following commands do exactly the same thing:

writeVcf(vcf, "example.vcf", index=TRUE)
writeVcf(vcf, "example.vcf.bgz", index=TRUE)
writeVcf(vcf, "example.vcf.gz", index=TRUE)

Desired behaviour: specifying a .vcf.gz as the output file, actually writes to the output file instead of silently changing the suffix of the output file to .vcf.bgz.

d-cameron commented 4 years ago

https://github.com/PapenfussLab/gridss/issues/269

bschilder commented 2 years ago

I agree, this is very unexpected. Could a warning at least be generated when writeVcf changes the path name?

vjcitn commented 2 years ago

I suppose a message could be produced. The man page for bgzip in Rsamtools shows why this is happening. If you have time to make a PR the code of interest is at https://github.com/Bioconductor/VariantAnnotation/blob/e966a1b3cc3cede22c4d65fcada24be05206a65a/R/methods-writeVcf.R#L257

bschilder commented 2 years ago

In the interest of bettering VariantAnnotation, while also being mindful I'm not able to devote too much time to projects I'm not a maintainer or author of, I propose the following divvying of work:

@vjcitn

@bschilder

Does this sound fair to you? I can get started once item # 1 on your list is completed (readVcf). That way the rest of the changes I plan to make will be optimised for the updated version of VariantAnnotation.

Best, Brian

hpages commented 1 year ago

@vjcitn @bschilder Are guys planning to follow up on this?

vjcitn commented 1 year ago

@bschilder did you make the PR that you mentioned? i cannot work on the other piece for some time.

bschilder commented 1 year ago

@hpages I didn't make this PR because @vjcitn determined it was beyond the scope of VariantAnnotation to include these functionalities (or at least some of them). So instead, I added them to our lab's package MungeSumstats. If you've changed your mind about this @vjcitn I'd be happy to share my existing code.