Closed Thatguy027 closed 5 years ago
Hi, can you show an example of a variant in your VCF that should have been annotated with an entry in ANC.bed.gz (along with that entry)?
It looks like your ANC.bed.gz file has always REF == ALT, so it probably does not match anything in your VCF for that reason.
Hi,
Thanks for the quick response.
I originally set up the ANC bed file to only have four columns, the fourth column as a diploid genotype (e.g. "T/T"). I thought this might be causing the issues so I split out the diploid genotype to two columns and only used the fourth for annotations. I was under the impression that vcfanno did not attempt to match alleles when annotating with a bed file, is that not the case?
Here are the VCF sites corresponding to the test set:
bcftools query --regions I:3731-28123 -f '%CHROM\t%POS\t%REF\t%ALT\n' input_files/330_TEST.vcf.gz
I 3731 T A
I 5317 T A
I 11075 G A
I 14398 A G
I 15196 C T
I 15629 G A
I 22510 G A
I 24269 A T
I 27584 A T
I 28123 C G
And here is ancestor at corresponding sites
bcftools query --regions I:3731-28123 --samples XZ1516 -f '%CHROM\t%POS\t%END\t[%TGT\t%GT]\n' input_files/330_TEST.vcf.gz
I 3731 3731 T/T 0/0
I 5317 5317 T/T 0/0
I 11075 11075 G/G 0/0
I 14398 14398 A/A 0/0
I 15196 15196 C/C 0/0
I 15629 15629 G/G 0/0
I 22510 22510 G/G 0/0
I 24269 24269 A/A 0/0
I 27584 27584 A/A 0/0
I 28123 28123 C/C 0/0
S
oh, I think you'll have to add 1 to the end (or subtract 0 from the start, not sure if it is 0 or 1 -based) as vcfanno does not handle 0 length intervals. I did not notice this the first time around
I thought that might be the case, but was worried about MNPs throwing off the annotations. The VCF is 1-based, does that mean I should add 1 to the end pos?
that depends what bcftools outputs in your case. I suspect you should subtract 1 from the start and then it will be proper bed format.
Great thanks. I'll look into it to make sure.
Dear @brentp ,
It is common to have this issue. Usually I annotate from several files, and constantly one or another annotation fail, despite the fact that the field gets added to the header. Always struggle to understand this behavior, it fails both with general tabular files (properly tabixed) and bed files.
Please find a simple example, where I try to annotate a VCF with a bed file. Conf file:
[[annotation]]
file="/mnt/nfs/lobo/MCFONSECA-NFS/pedro.barbosa/kipoi/kipoisplice/1_kipoisplice4_to_annotate.bed.gz"
columns=[6]
names=["kipoisplice_4"]
ops=["self"]
I get the following log:
=============================================
vcfanno version 0.3.1 [built with go1.11]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
api.go:804: WARNING: using op 'self' when with Number='1' for '' from '/mnt/nfs/lobo/MCFONSECA-NFS/pedro.barbosa/kipoi/kipoisplice/1_kipoisplice4_to_annotate.bed.gz' can result in out-of-order values when the query is multi-allelic
api.go:805: : this is not an issue if the query has been decomposed.
vcfanno.go:241: annotated 8 variants in 0.00 seconds (3748.2 / second)
Can you reproduce the error? Best, Pedro
hi, I just added this in a branch and everything seems fine to me. this is just a warning. and the variants are annotated correctly. since you have no header on the bed file (1_kipoisplice4_to_annotate.bed.gz) vcfanno can't know that you have ref and alt fields in there. if you add a header with "ref" and "alt" for those columns, then it can do allele-specific annotation.
hi, which version are you using ?
my output vcf lacks that annotation, only adds information in the VCF header. Even though i have no header in the bed file, it should do position-based annotation (all alleles split by ',' ), right ?
Indeed, when I add the header in the bed, it works, but I don't understand these exceptions. For instance, this following file gets properly annotated, and the bed file follows the same format (with no header). 1_deltaLogitPSI_to_annotateVCF.bed.gz
maybe you need to re-index your bed file? i am using version 0.3.2 but i doubt this is broken in any release.
dear @brentp ,
It doesn't seem to be related with the index, or there is an issue with my own tabix version (1.2.1)
I transformed the same bed file into tab delimited with a header. Indexed it with tabix -s1 -b2
and tried to annotate the 5th column.. I get empty VCF both at 0.3.1 and 0.3.2 versions. Could you test with my tab delimited and index files (changed tbi to txt to allow the upload)?
1_kipoisplice4_to_annotate.tsv.gz
1_kipoisplice4_to_annotate.tsv.gz.txt
Thank you in advance, Pedro
so, before I look into this new set. you agree that your first test-case works in vcfanno as expected? I want to make sure we're looking at the same problem.
I agree, with the header added it works great
ok. so now, the problem is that you've changed to .tsv, indexed with tabix and it doesn't annotate as expected? can you make sure that you use tabix --zero-based
if your annotations have bed-like (0-based, half-open) intervals? I think that with anything but .bed.gz, tabix will assume you have 1-based start positions.
This 'tsv' was the original file (1-based), i changed it to bed to find a way to get it working. Nevertheless, by indexing with --zero-based
, even though i know the tab file is 1-based, it doesn't produce any annotation.
I think i found the problem. I wasn't setting the -e
in tabix. Apparently i need to tabix -s1 -b2 -e2
in my tab delimited files.
glad to hear it. i was out of ideas.
Hi there,
I've added all the requested information and files below.
I am not getting an error, but one of the annotations I am trying to add to my VCF is not getting appended to the INFO filed. I see "AA" appear in the VCF header, which leads me to believe that vcfanno is recognizing the input:
The actual configuration file contains more annotations, all of which are successfully being appended to the INFO filed of the VCF.
I'm not exactly sure why this particular bed file is not working. I am trying to append an ancestor allele to my VCF. The bed file I am using to annotate the VCF has a line for every position in the VCF file.
Thanks!
If you have encountered an error, please include:
] the command you used to invoke vcfanno
[
vcfanno version 0.2.8 [built with go1.8]
see: https://github.com/brentp/vcfanno
vcfanno.go:115: found 11 sources from 9 files bix.go:200: chromosome MtDNA not found in /projects/b1059/projects/Stefan/CePopGen-nf/input_files/annotations/gquarted.bed.gz bix.go:200: chromosome MtDNA not found in /projects/b1059/projects/Stefan/CePopGen-nf/input_files/annotations/TF_bindingsite_paper.bed.gz bix.go:200: chromosome MtDNA not found in /projects/b1059/projects/Stefan/CePopGen-nf/input_files/annotations/promoter.bed.gz bix.go:200: chromosome MtDNA not found in /projects/b1059/projects/Stefan/CePopGen-nf/input_files/annotations/histone_binding.bed.gz vcfanno.go:241: annotated 29141 variants in 17.30 seconds (1684.0 / second)
] the full error message