brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

Max operation #25

Closed snashraf closed 8 years ago

snashraf commented 8 years ago

[[postannotation]] fields=["af_exac_all", "af_exac_afr", "af_exac_amr", "af_exac_eas", "af_exac_nfe", "af_exac_oth", "af_exac_sas"] op="max" name="max_aaf_all" type="Float"

I am trying to do similar operation with 1K Frequency like file="ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz" [[postannotation]] fields=["EAS_AF", "AMR_AF", "AFR_AF", "EUR_AF", "SAS_AF"] name= "1G_MA" op="max" type="Float

But some how I am getting any results . :vcfanno.go:156: Info Error: SAS_AF not found in header >> this error may occur many times. reporting once here...

Seems like for records SAS_AF is not available and hence not reporting any data. Could you please help on this.

brentp commented 8 years ago

In order to use the fields from 1kg, you first need to annotate with them so thaty are in your VCF. Something like:

[[annotation]]
file="ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz"
fields=["EAS_AF", "AMR_AF", "AFR_AF", "EUR_AF", "SAS_AF"]
names= ["1kgEAS_AF", "1kgAMR_AF", "1kgAFR_AF", "1kgEUR_AF", "1kgSAS_AF"]
ops=["self", "self", "self", "self", "self"]

[[postannotation]]
fields= ["1kgEAS_AF", "1kgAMR_AF", "1kgAFR_AF", "1kgEUR_AF", "1kgSAS_AF"]
name= "1G_MA"
op="max"
type="Float
snashraf commented 8 years ago

[[annotation]] file="ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz" fields=["EAS_AF", "AMR_AF", "AFR_AF", "EUR_AF", "SAS_AF"] names= ["1kgEAS_AF", "1kgAMR_AF", "1kgAFR_AF", "1kgEUR_AF", "1kgSAS_AF"] ops=[ "self", "self", "self", "self", "self"]

[[postannotation]] fields =["1kgEAS_AF", "1kgAMR_AF", "1kgAFR_AF", "1kgEUR_AF", "1kgSAS_AF"] name= "1G_MA" op="max" type="Float"


and I am getting this error . :( :(

vcfanno.go:156: Info Error: 1kgSAS_AF not found in row >> this error may occur many times. reporting once here... panic: interface conversion: interface is []float32, not float32

goroutine 3265 [running]: panic(0x7cf260, 0xc829325d00)

brentp commented 8 years ago

can you post the command-line that you're using, and the full conf file

snashraf commented 8 years ago

[[annotation]] file="/gpfs/data_jrnas1/ref_data/Hsapiens/GRCh37/variation/ExAC.r0.3.1.sites.vep.vcf.gz" fields=["AF"] names =["ExAC_AF"] ops=["self"]

[[annotation]] file="whole_genome_SNVs.tsv.gz" columns=[6] names= ["CADD"] ops=["mean"]

[[annotation]] file="/gpfs/data_jrnas1/ref_data/Hsapiens/1000g/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz" fields=["EAS_AF", "AMR_AF", "AFR_AF", "EUR_AF", "SAS_AF"] names= ["1kgEAS_AF", "1kgAMR_AF", "1kgAFR_AF", "1kgEUR_AF", "1kgSAS_AF"] ops=[ "self", "self", "self", "self", "self"]

[[postannotation]] fields =["1kgEAS_AF", "1kgAMR_AF", "1kgAFR_AF", "1kgEUR_AF", "1kgSAS_AF"] name= "1G_MA" op="max" type="Float"

Here is commond line: vcfanno -p 4 conf.toml 01.vcf.gz

I am able to annotate with others filed from other sources and only getting problem with this max operation.

Thanks

brentp commented 8 years ago

can you show the full error message that you are seeing? I am able to run this without problems.

can you also do:

zgrep -m1 SAS_AF /gpfs/data_jrnas1/ref_data/Hsapiens/1000g/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz

and let me know the results. Maybe your genotypes VCF doesn't have SAS_AF?

snashraf commented 8 years ago

No, I have SAS_AF in my genotype VCF as well. [nsyed@hpclogin1 work]$ zgrep -m1 "SAS_AF" /gpfs/data_jrnas1/ref_data/Hsapiens/1000g/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AB082422_S5_L005 ABC09005_S2_L002 ABM090010_S7_L007

vcfanno.go:156: Info Error: 1kgAF not found in row >> this error may occur many times. reporting once here... panic: interface conversion: interface is []float32, not float32

goroutine 3267 [running]: panic(0x7cf260, 0xc825772800) /usr/local/src/go-git/src/runtime/panic.go:500 +0x18c github.com/brentp/vcfanno/api.asfloat32(0x7a3ac0, 0xc82bb7e000, 0x4f0000c82012f690) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/reducers.go:75 +0xac github.com/brentp/vcfanno/api.max(0xc8244ebfc0, 0x1, 0x2, 0x3, 0xc8201451a0) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/reducers.go:32 +0x55 github.com/brentp/vcfanno/api.(_Annotator).PostAnnotate(0xc8201271c0, 0xc8236084ce, 0x1, 0x3ba9, 0x3baa, 0xa0f240, 0xc82360a460, 0x0, 0x0, 0x0, ...) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/api.go:482 +0x1584 github.com/brentp/vcfanno/api.(_Annotator).AnnotateEnds(0xc8201271c0, 0xa0f8a0, 0xc8203fbf80, 0x0, 0x0, 0x0, 0x0) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/api.go:583 +0xf33 main.main.func1(0xa0f8a0, 0xc8203fbf80) /usr/local/src/gocode/src/github.com/brentp/vcfanno/vcfanno.go:151 +0x61 github.com/brentp/irelate.PIRelate.func1.1(0xc821110000, 0x190, 0x190, 0xc82221a2d0, 0xc825c76120) /usr/local/src/gocode/src/github.com/brentp/irelate/parallel.go:201 +0x4c created by github.com/brentp/irelate.PIRelate.func1 /usr/local/src/gocode/src/github.com/brentp/irelate/parallel.go:205 +0x7b

brentp commented 8 years ago

so this is a new error, yes? where is "1kgAF" coming from? I don't see that in the conf above.

snashraf commented 8 years ago

Here you go,

vcfanno.go:156: Info Error: 1kgSAS_AF not found in row >> this error may occur many times. reporting once here... panic: interface conversion: interface is []float32, not float32

goroutine 3319 [running]: panic(0x7cf260, 0xc8257928c0) /usr/local/src/go-git/src/runtime/panic.go:500 +0x18c github.com/brentp/vcfanno/api.asfloat32(0x7a3ac0, 0xc8217016e0, 0x4c0000c8201436b0) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/reducers.go:75 +0xac github.com/brentp/vcfanno/api.max(0xc82502b000, 0x5, 0x8, 0x3, 0xc82014f340) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/reducers.go:32 +0x55 github.com/brentp/vcfanno/api.(_Annotator).PostAnnotate(0xc8201392c0, 0xc82315d9ec, 0x1, 0x2a0ae0, 0x2a0ae1, 0xa0f240, 0xc823494180, 0x0, 0x0, 0x0, ...) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/api.go:482 +0x1584 github.com/brentp/vcfanno/api.(_Annotator).AnnotateEnds(0xc8201392c0, 0xa0f8a0, 0xc8230f5890, 0x0, 0x0, 0x0, 0x0) /usr/local/src/gocode/src/github.com/brentp/vcfanno/api/api.go:583 +0xf33 main.main.func1(0xa0f8a0, 0xc8230f5890) /usr/local/src/gocode/src/github.com/brentp/vcfanno/vcfanno.go:151 +0x61 github.com/brentp/irelate.PIRelate.func1.1(0xc824378000, 0x190, 0x190, 0xc821190750, 0xc826c93c80) /usr/local/src/gocode/src/github.com/brentp/irelate/parallel.go:201 +0x4c created by github.com/brentp/irelate.PIRelate.func1 /usr/local/src/gocode/src/github.com/brentp/irelate/parallel.go:205 +0x7b

I am getting same error for max operations on other values as well.

brentp commented 8 years ago

This is because there are sites with multiple alternate alleles so then it looks like:

SAS_AF=0.3497,0.6472

you can use vt decompose to get around this for now. I'm not sure how best to handle that inside of vcfanno.

You can also write your own lua function that checks if it got an array ( a lua table) or just a value.

brentp commented 8 years ago

actually, I'm working on a fix for this so that for the case of "max", we can collapse cases like this...

brentp commented 8 years ago

I have made a change so that for max() and min() it will do the expected thing even if there are multiple values for one of the overlapping sites. If you're on linux, you can try the new binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_0.0.11_beta4

you may have to chmod +x it. Thanks for reporting.

snashraf commented 8 years ago

Let me try now and will let you know. Thanks

snashraf commented 8 years ago

Its working now . Many Thanks !!

snashraf commented 2 years ago

Hi Brent !!

I am performing the max operation as below.

[[annotation]] file="/gpfs/data_jrnas1/public_data/genomeAsia100k/100kAsian.annot.cont_withmaf.vcf.gz" fields=["AF","AF_OCE","AF_NEA", "AF_SEA", "AF_SAS", "AF_AFR", "AF_AMR", "AF_WER"] names=["1000k_asia_AF","1000k_asia_AF_OCE", "1000k_asia_AF_NEA","1000k_asia_AF_SEA","1000k_asia_AF_SAS","1000k_asia_AF_AFR","1000k_asia_AF_AMR","1000k_asia_AF_WER"] ops=["self","self", "self", "self", "self", "self","self", "self"]

[[postannotation]] fields= ["1000k_asia_AF","1000k_asia_AF_OCE", "1000k_asia_AF_NEA","1000k_asia_AF_SEA","1000k_asia_AF_SAS","1000k_asia_AF_AFR","1000k_asia_AF_AMR","1000k_asia_AF_WER"] name= "1000k_asia_MAX" op="max" type="String"

I am getting output file as :

INFO=

I want "Number=A" not the "Number=. " in the VCF header?

thanks for your help.

Regards, Najeeb

brentp commented 2 years ago

I think for string it defaults to ".". Why not make it a Float? you can do this by using name="1000k_asia_MAX_float".