brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

support for GWAS summary TSV files #142

Open darked89 opened 3 years ago

darked89 commented 3 years ago

Hello,

The majority of summary stats from various studies & biobanks are available as bgzip-ed TSVs. For converting these to VCF one can use gwas2vcf (https://github.com/MRCIEU/gwas2vcf) but it supports at this point just a set of input columns. Extending it beyond that does not look like a trivial task at lest to me.

Putting these "extra" columns from TSV to gwas2vcf produced VCF is something which can be done using bcftools annotate, but
this looks a much less flexible process than vcfanno. I am positive than rather sooner than later I will have to not just copy some value from the TSV and "paste" it into VCF but modify it on the fly.

Hence my questions:

  1. would it be possible to enhance vcfanno to handle at least the "well behaved" GWAS TSV files as an annotation source? For example the TSV format described here: https://finngen.gitbook.io/documentation/data-description

  2. In a meantime, can vcfanno use BED-VCF-like format derived from above Finngen's TSV with canonical first 3 BED columns plus REF & ALT

    22      100000          100000      A     T 

followed by either all the remaining columns from the TSV input or just the "extras" not present already already in the gwas2vcf produced?

The ALT and REF are needed, since the input TSV has some rows with things like:

22      100000          A     T       bunch_of_columns_here
22      100000          A     G      bunch_of_columns_here
22      100000          A     CG

Thank you,

Darek Kedra

brentp commented 3 years ago

Yes, this is already possible. As long as you can bgzip and tabix it, then vcfanno can use it. You will also need a header that indicates "ref" and "alt". You can do that with, e.g. #chrom\tstart\tstop\tref\alt\t.. for your example above.

Let me know if this answers your question. -Brent