Closed sigven closed 9 years ago
I'll have a look at 1. It's difficult to decide what to do there generally, but I agree that concat should keep the comma's if possible.
For 2. There should be new header lines for each annotation that you add. Did you have something else in mind or is that not working for you?
Thanks. With respect to 2) I am afraid that did not work for me (vcfanno version 0.0.7 [built with go1.5beta3]), I see only my query header lines in the output, not the annotation headers.
ok. 2) is fixed on master (but it's currently not buildable without dev branch of some dependencies)
for 1), I'm looking into special-casing concat for multiple-value fields. (comma-separated). For stuff like mean/max/ etc, It would have to pull the numbers. But, I'm still thinking about the best way to do this.
@sigven this should be resolved. I'm chasing down a few more things before 0.8 release, but if you're on 64 bit linux and could give this a try, that would be very helpful; here is the executable:
@brentp I made a test now, used ClinVar as a query VCF, and ran annotation against ExAC and DoCM.
A couple of notes:
1) I receive an error when i get to chromosome 8: index: no reference. Can't seem to understand if this is an error on my side or not. 2) I would really like the INFO headers coming from my annoations to be kept as they are; currently i loose the informative Description as it is being changed to "calculated by concat of overlapping values in field" etc. 3) Info tags with multiple values (i.e. Number=.) are comma-separated in the original annotation VCFs. In the output produced by vcfanno_08, they appear in brackets separated by space, e.g.
in annotation source VCF: DOCM_DISEASE=chronic_myeloid_leukemia,acute_myeloid_leukemia;DOCM_PMIDS=23634996, 23656643
in output VCF from vcfanno_08: DOCM_DISEASE=[chronic_myeloid_leukemia acute_myeloid_leukemia];DOCM_PMIDS=[23634996 23656643]
I'm trying to write a test for this. Can you share your conf file?
OK. I found what you mean. I've made that part of the code less fragile and fixed the issue you describe.
For the header, what you're requesting is different than what I had in mind, but I'll have a look.
An executable that fixes the multiple values problem is here: https://www.chpc.utah.edu/~u6000771/vcfanno_08a1
OK, just tested vcfanno_08a1. The use of comma instead of brackets now works. Thanks!
The index error still puzzles me, though. Do you believe that's an error wrt. my query VCF? Variant that fails is chr8:g.1712049C>T (first variant on chromosome 8 in my query VCF, which from what I can judge is a valid variant).
I understand that the header issue is not as straightforward as I imagined, taking all the various operations you offer into consideration. I can probably make a workaround so that it suits my needs.
I temporarily overlooked your index error. It could be that chr8 does not exist in one of your annotation files. Obviously, it should fail on that. I'll have a fix in a few hours.
The h
... I mean "should not fail on that" ...
updated binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_08a2
that fixes the index not found problem, still thinking about the header.
did you have a chance to check http://home.chpc.utah.edu/~u6000771/vcfanno_08a2 ? I'd like to release 0.8, but it has a lot of new changes so it'd be good to get your feedback.
Hmm.. after the header gets printed, I am getting an error which is not too informative: 2015/10/21 08:33:21 gzip: invalid header
I suspect one of my VCF files has a problem, but it's hard to assess what is wrong.
with the new version of vcfanno, everything has to be bgzipped and tabixed. I'll look into the message.
On Wed, Oct 21, 2015 at 12:39 AM, Sigve Nakken notifications@github.com wrote:
Hmm.. after the header gets printed, I am getting an error which is not too informative: 2015/10/21 08:33:21 gzip: invalid header
I suspect one of my VCF files has a problem, but it's hard to assess what is wrong.
— Reply to this email directly or view it on GitHub https://github.com/brentp/vcfanno/issues/7#issuecomment-149796510.
I just tagged a new release here: https://github.com/brentp/vcfanno/releases/tag/v0.0.8 that has a better error message for the case you describe.
great work @brentp ! Works very good now.
Now I am only having trouble with one VCF file (runtime error), I suspect that it has to do with the size of an INFO tag value (this often exceed 100 characters), is there a limitation for this in your vcf reader?
panic: runtime error: index out of range
goroutine 25 [running]: github.com/brentp/vcfgo.(*Reader).Parse(0x1985ac20, 0x1bc49500, 0x7, 0xa, 0x0, 0x0) /usr/local/src/gocode/src/github.com/brentp/vcfgo/reader.go:202 +0x6aa
can you send the full traceback?
By the line number, that should only happen if your vcf has too few fields for a given line.
I put a binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_081 that will output a more informative error message for the line that's causing the error.
My bad. One of my annotation VCF files had inherent format errors.
no problem. I want it to have informative messages even when it borks... I'll close for now. Let me know of any other issues.
Hi,
I just played with your tool, great work:) Looking at the result from a test I did, annotating ~ 100,000 variants against 6-7 other VCF files, there were a few things that caught my attention:
1) If my annotation file had an INFO field with multiple values (i.e. "Number=.", in which multiple values are being comma-separated for each variant), I could not figure out which operation was best to retrieve the complete set of values. I tried 'uniq' and 'concat', but either way it seems vcfanno concatenates the values with the pipe operator ('|'). Would it be possible to get the identical comma-separated as is present in the annotation VCF file?
2) Would you consider adding the meta-information lines concerning the INFO fields of interest that you specify in the configuration file in the result VCF?