Closed bschilder closed 3 years ago
Controlled by user with write_vcf=
flag in format_sumstats
@bschilder this is something myself and @NathanSkene discussed, since you are essentially exporting to VCF without any metadata we didn't find it that useful an option (my understanding was that knowing things like stats on the sequencing and info on the patients was a key part of VCF). Do you see a use case for having it in VCF format without this information included?
Fair point, might be worth putting this info back into the header before saving.
The reason i wanted to leave this as an option is because there's certain software that require VCF format. For example, DeepSEA is one of the most commonly used deep learning variant effect predictor models because you can easily upload a VCF to their web server.
I don't think we need to provide output formats for every format, but VCF is a big one right up there with tabular data. Exporting to VCF also means we're providing a tool that potentially makes it easier for people to submit their sum stats to databases like Open GWAS.
Ultimately up to you, and i certainly agree about keeping tabular format as the default, but i think there's some substantial benefits, and it's easy enough to do.
Okay let's go for it that sounds like a fair point. Definitely worth documenting and adding a comment to the code to warn that no metadata will be included in th VCF. Thanks for the suggestionsOn 11 Jul 2021 09:47, "Brian M. Schilder" @.***> wrote: Fair point, might be worth putting this info back into the header before saving. The reason i wanted to leave this as an option is because there's certain software that require VCF format. For example, DeepSEA is one of the most commonly used deep learning variant effect predictor models because you can easily upload a VCF to their web server. I don't think we need to provide output formats for every format, but VCF is a big one right up there with tabular data. Exporting to VCF also means we're providing a tool that potentially makes it easier for people to submit their sum stats to databases like Open GWAS. Ultimately up to you, and i certainly agree about keeping tabular format as the default, but i think there's some substantial benefits, and it's easy enough to do.
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.
I'm in favour of Brian's suggestion as well. Maybe we require that arguments be provided for some meta-data, e.g. which PMID is it associated with? Which trait? Should increase the user base for the package if we allow for this & help encourage takeup of VCF.
On Sun, 11 Jul 2021 at 10:22, Alan Murphy @.***> wrote:
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Okay let's go for it that sounds like a fair point. Definitely worth documenting and adding a comment to the code to warn that no metadata will be included in th VCF. Thanks for the suggestionsOn 11 Jul 2021 09:47, "Brian M. Schilder" @.***> wrote: Fair point, might be worth putting this info back into the header before saving. The reason i wanted to leave this as an option is because there's certain software that require VCF format. For example, DeepSEA is one of the most commonly used deep learning variant effect predictor models because you can easily upload a VCF to their web server. I don't think we need to provide output formats for every format, but VCF is a big one right up there with tabular data. Exporting to VCF also means we're providing a tool that potentially makes it easier for people to submit their sum stats to databases like Open GWAS. Ultimately up to you, and i certainly agree about keeping tabular format as the default, but i think there's some substantial benefits, and it's easy enough to do.
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurogenomics/MungeSumstats/issues/18#issuecomment-877768411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5ZPE3VEJGUVL4HSRB5TV3TXFPFPANCNFSM5ADES2TQ .
For datasets imported from Open GWAS via import_sumstats
, i can easily enough include this info bc it's accessible via find_sumstats
Opening a new Issue for this here: #25
Implemented using :