Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Option to write as VCF #18

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

Implemented using :

        VariantAnnotation::writeVcf(vr, filename = check_save_out$save_path)
bschilder commented 3 years ago

Controlled by user with write_vcf= flag in format_sumstats

Al-Murphy commented 3 years ago

@bschilder this is something myself and @NathanSkene discussed, since you are essentially exporting to VCF without any metadata we didn't find it that useful an option (my understanding was that knowing things like stats on the sequencing and info on the patients was a key part of VCF). Do you see a use case for having it in VCF format without this information included?

bschilder commented 3 years ago

Fair point, might be worth putting this info back into the header before saving.

The reason i wanted to leave this as an option is because there's certain software that require VCF format. For example, DeepSEA is one of the most commonly used deep learning variant effect predictor models because you can easily upload a VCF to their web server.

I don't think we need to provide output formats for every format, but VCF is a big one right up there with tabular data. Exporting to VCF also means we're providing a tool that potentially makes it easier for people to submit their sum stats to databases like Open GWAS.

Ultimately up to you, and i certainly agree about keeping tabular format as the default, but i think there's some substantial benefits, and it's easy enough to do.

Al-Murphy commented 3 years ago

Okay let's go for it that sounds like a fair point. Definitely worth documenting and adding a comment to the code to warn that no metadata will be included in th VCF. Thanks for the suggestionsOn 11 Jul 2021 09:47, "Brian M. Schilder" @.***> wrote: Fair point, might be worth putting this info back into the header before saving. The reason i wanted to leave this as an option is because there's certain software that require VCF format. For example, DeepSEA is one of the most commonly used deep learning variant effect predictor models because you can easily upload a VCF to their web server. I don't think we need to provide output formats for every format, but VCF is a big one right up there with tabular data. Exporting to VCF also means we're providing a tool that potentially makes it easier for people to submit their sum stats to databases like Open GWAS. Ultimately up to you, and i certainly agree about keeping tabular format as the default, but i think there's some substantial benefits, and it's easy enough to do.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

NathanSkene commented 3 years ago

I'm in favour of Brian's suggestion as well. Maybe we require that arguments be provided for some meta-data, e.g. which PMID is it associated with? Which trait? Should increase the user base for the package if we allow for this & help encourage takeup of VCF.

On Sun, 11 Jul 2021 at 10:22, Alan Murphy @.***> wrote:

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Okay let's go for it that sounds like a fair point. Definitely worth documenting and adding a comment to the code to warn that no metadata will be included in th VCF. Thanks for the suggestionsOn 11 Jul 2021 09:47, "Brian M. Schilder" @.***> wrote: Fair point, might be worth putting this info back into the header before saving. The reason i wanted to leave this as an option is because there's certain software that require VCF format. For example, DeepSEA is one of the most commonly used deep learning variant effect predictor models because you can easily upload a VCF to their web server. I don't think we need to provide output formats for every format, but VCF is a big one right up there with tabular data. Exporting to VCF also means we're providing a tool that potentially makes it easier for people to submit their sum stats to databases like Open GWAS. Ultimately up to you, and i certainly agree about keeping tabular format as the default, but i think there's some substantial benefits, and it's easy enough to do.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurogenomics/MungeSumstats/issues/18#issuecomment-877768411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5ZPE3VEJGUVL4HSRB5TV3TXFPFPANCNFSM5ADES2TQ .

bschilder commented 3 years ago

For datasets imported from Open GWAS via import_sumstats, i can easily enough include this info bc it's accessible via find_sumstats

bschilder commented 3 years ago

Opening a new Issue for this here: #25