brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

custom lua error- reciprocal overlap #74

Closed maggie-fu closed 6 years ago

maggie-fu commented 6 years ago

Hi @brentp ,

Thank you for creating vcfanno! It worked really well for us because we are working with mostly SVs. I am interested in calculating the reciprocal overlap between sample variants and reference variants. I tried to create a filter that ensure overlap between the two variants, which seemed to work. I tried to calculate the percentage of overlap after that and I am stuck. I just recently learned about the tool and I did not have coding experience before this. I am sorry if my code contain many mistakes.

The command I used was ./vcfanno_linux64 -lua custom_RO_min.lua conf_1000G_min.toml NA12878.LUMPY_sorted_filtered.vcf.gz > annotated_test.vcf

And I got

panic: toml: cannot load TOML value of type []interface {} into a Go string

goroutine 1 [running]:
main.main()
    /home/brentp/go/src/github.com/brentp/vcfanno/vcfanno.go:85 +0x19db

My query, annotation, conf.toml and custom.lua files are attached below RO.zip

brentp commented 6 years ago

in postannotation, you have:

op=["lua:overlap_bp(ss, se)"]

but in postannotation, there's just one possible op, so it should be:

op="lua:overlap_bp(ss, se)"

I didn't check anything else, but that should get you further.

maggie-fu commented 6 years ago

Thank you for your help. I later realized the filter was completely unnecessary. I have another question, @brentp

The command, query and annotation files were the same.

My conf.toml is this:

[[annotation]]
file="hg19_ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes_DEL_freqadded.rare.bed.gz"
names=["ss_int", "se_int"]
columns=[2, 3]
ops=["self", "self"]

[[postannotation]]
name="overlap_bp"
fields=["ss", "se"]
op="lua:overlap_bp(ss, se)"
type="Float"

custom.lua was this:

function overlap_bp(ss, se)
        result = math.min(se, stop) - math.max(ss, start)
        return string.format("%.9f", result)
end

The error message was a little long. I attached the whole error, but it appeared to be a repeat of this:

vcfanno.go:187: Info Error: se not found in INFO >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "1645946,1682577,1681048": invalid syntax >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "1645946,1682577,1681048,1650523,1650651": invalid syntax >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "1682577,1681048,1650523,1650651": invalid syntax >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "12919194,12933272,12948772,12928826,13049137": invalid syntax >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "12948772,13049137,13187474": invalid syntax >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "22327106,22332441": invalid syntax >> this error/warning may occur many times. reporting once here...
vcfanno.go:187: strconv.Atoi: parsing "120158462,120154464": invalid syntax >> this error/warning may occur many times. reporting once here...

annotated_test_error.txt

It seemed like the error occurred because more than one set of "ss" or "se" were available and only one value can be calculated. How can I modify the lua code so that multiple calculations can be performed?

Thank you very much again.

brentp commented 6 years ago

is this still an issue? as you can see, the data is comma-delimited so it can't be used as an int unless it has Number=A and op="self"

maggie-fu commented 6 years ago

I think you might have misunderstood the problem. I should make some clarification. With the current [[annotation]], the output is something like ss=1601719,1626665,1649047,1649620;se=1682577,1681048,1650523,1650651 So annotation can be successfully added, but the error message appears due to postannotation. When I change the ops of annotation to ["first", "first"], the annotation is like ss=1601719;se=1682577;overlap_bp=79451 The postannotation part can be successfully added, and there is no error message.

My question is that is there any way to change the annotation output with multiple values to allow downstream postannotation?

brentp commented 6 years ago

yeah. that's a bit of a pain, but you can check if there's a common in the string in lua and split it and return a comma-delimited string.