brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

Adding a custom value for missing annotation #116

Closed edg1983 closed 4 years ago

edg1983 commented 4 years ago

Hi bred!

I have a suggestion for your already amazing vcfanno.

When annotating AF from an external file (gnomAD for example) it would be more convenient to have the annotation field set to zero for variants not found in gnomAD VCF. In this way, one can simply filter for rare variant doing gnomAD_AF < 0.01 for example

In general, it would be great to have the possibility to specify in the toml configuration file how to annotate variants when they are not found in the annotation source, so one can decide to leave them un-annotated (as it is right now) or adding a value (like zero for AF annotation or some custom value for other kinds of annotations).

wm75 commented 4 years ago

hmm, that reminds me of a question I always forget asking: if I'm not mistaken, gnomAD has 0 values for AF already. What's the difference between an actual AF=0 and a variant without information. Are the zero ones rounded down from really small AFs or is it something else?

brentp commented 4 years ago

Hi, I think you can do this with a [[postannotation]] that takes your annotated value and returns -1 if it's undefined and the value if it is defined.

edg1983 commented 4 years ago

I will try with the post annotation as you suggested. Thanks!

brentp commented 4 years ago

@wm75 so variants were discovered in the cohort, but then filtered/QC-ed somehow so they have an AF of 0. there is no rounding.

wm75 commented 4 years ago

@brentp Thanks for the clarification!

edg1983 commented 4 years ago

Hi brent, I'm still trying to implement this correctly, but it seems I can not find the right recipe. Can you give me an example of how to configure a lua script and post-annotation to fill in missing annotations with a custom value as you suggested before?

I'm performing annotation with some exome AF. When a variant in my VCF is not present in the cohort_exome.vcf I want to set -1 for exome_af. So I have configured annotations as follows

[[annotation]]
file="cohort_exome.vcf.gz"
fields=["AF","nhomalt"]
ops=["self","self"]
names=["exome_af","exome_nhomalts"]

[[postannotation]]
fields=["exome_af"]
op="lua:checkaf(exome_af)"
name="exome_af"
type="Float"

Then I've prepared a custom.lua file as follows

function checkaf(v)
  local output = v
  if (v == nil) then output = -1 end
  return output
end

However, vcfanno gives me warning that the exome_af field do not exist in INFO and the -1 value is not inserted.

brentp commented 4 years ago

well, that should have worked, but it didn't. I just pushed a fix for this and added something like your example as a test-case. thanks for the clear example. here is a linux binary with the fix. vcfanno_dev.gz

edg1983 commented 4 years ago

Hi brent,

Thanks for the quick fix! It works perfectly now. Just in case someone wants to replicate my approach, I've noticed that the previous version of my lua script replace missing values, but causes values already present to be outputted between square brackets like this

exome_af=[0.323]

So to obtain the desired result the lua script I've used is:

function checkaf(v)
  if (v == nil) then 
    output = -1
  else
    output = v[1]
  end  
  return output
end
brentp commented 4 years ago

you can also use table.concat if you want to join the values into a string.