brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

How to extract fields that are not ##INFO from gnomAD vcf? #124

Open francoiskroll opened 4 years ago

francoiskroll commented 4 years ago

That must be me; but I can't seem to find how to pull some specific fields from a gnomAD vcf.

gnomAD vcf file header:

fileformat=VCFv4.2

FILTER=

hailversion=0.2.24-9cd88d97bedd

FILTER=

FILTER=

FILTER=

INFO=

...

Example of entry from gnomAD vcf file:

chr20 4694655 rs143783853 C G 1.51257e+06 PASS AC=1490 ...

Say I want to extract all of the 8 fields above. I'm starting with rs ID / FILTER / AC as a test.

My conf.toml file is:

[[annotation]] file="gnomad_chr20.vcf.gz" fields = ["ID", "FILTER", "AC"] ops=["self", "self", "self"] names=["ID", "gnomad_FILTER", "AC"]

And my command:

vcfanno -p 4 conf.toml variants.vcf > annotated.vcf

rs ID and AC works great, but I can't seem to get the FILTER out. I have also tried with fields = ["PASS"].

Example of entry of my own vcf:

chr20 4694655 . C G 59835.5 PASS BaseCalledReadsWithVariant=4037;BaseCalledFraction=0.401173;TotalReads=8792;AlleleCount=1;SupportFraction=0.487189;SupportFractionByBase=0.047,0.483,0.443,0.027 GT 0/1

chr20 4694655 . C G 59835.5 PASS BaseCalledReadsWithVariant=4037;BaseCalledFraction=0.401173;TotalReads=8792;AlleleCount=1;SupportFraction=0.487189;SupportFractionByBase=0.047,0.483,0.443,0.027;ID=rs143783853;AC=1490 GT 0/1

Can you help?

brentp commented 4 years ago

I think that the FILTER only gets added if it is not PASS (or .).

francoiskroll commented 4 years ago

I see. Seems like the field is empty in the gnomAD vcf if not PASS, but from what I can see the filter AC0 is always present if PASS is absent? So that should be okay.

Example

from gnomAD vcf:

chr20 4694192 . A T 182 AC0;AS_VQSR AC=0 ...

vcfanno adds annotation:

chr20 4694192 . A T 31.5 PASS ... ;gnomadv3_FILTER=AC0,AS_VQSR;gnomadv3_AC=0 GT 1/1


If useful for a future reader – I'm also able to pull the lcr (low complexity region) filter; for example:

My conf.toml:

[[annotation]] file="gnomad_chr20_4680000to4705000.vcf.gz" fields = ["FILTER", "lcr", "AC"] ops=["self", "self", "self"] names=["gnomadv3_FILTER", "gnomadv3_lcr", "gnomadv3_AC"]

Annotation looks like (this variant is PASS in both my vcf & on gnomAD):

chr20 4694861 . C A 31.5 PASS ... ;gnomadv3_lcr;gnomadv3_AC=86 GT 1/1

Thanks for your help!