Closed maggie-fu closed 7 years ago
your vcf file has no header.
I also tried a vcf with header, but it had the same problem.
./vcfanno_linux64 conf_new.toml NA12878.LUMPY_sorted_filtered.vcf.gz > annotated_test.vcf
conf:
[[annotation]]
file="annotation/ALL.wgs.mergedSV.v8.20130502.svs.genotypes_sort.vcf.gz"
fields=["AF", "AFR_AF", "AMR_AF", "EAS_AF", "EUR_AF", "SAS_AF"]
names=["1000G_AF", "1000G_AFR_AF", "1000G_AMR_AF", "1000G_EAS_AF", "1000G_EUR_AF", "1000G_SAS_AF"]
ops=["self", "self", "self", "self", "self", "self"]
Annotation vcf: file.zip
vcfanno matches on REF and ALT. by default. your vcfs have different REF and ALT values.
if you want to match on position and not on REF and ALT exact match then you can use vcfanno -permissive-overlap
@brentp, thank you for your help. I understand now.
However, when I tried to annotate with permissive overlap with op = 'by_alt' and Number = A, all the annotations looks like this 1000GAF_float=.;AFR_AF_float=.;AMR_AF_float=.;EAS_AF_float=.;EUR_AF_float=.;SAS_AF_float=.
I also tried to set op to self but yielded the same result. It worked when I set op to first or max. Is permissive overlap incompatible with multi-allelic annotations?
by_alt, as the name and docs should indicate matches on alt. that's the entire point. if I can make the docs clearer on this, please indicate how. permissive overlap and by_alt won't make sense when used together.
What I understood from reading the docs was that permissive-overlap flag eliminate the requirement of shared REF and ALT, when the by_alt op present the annotation in either comma or pipe- delimited way depending on whether it is multiple annos or alt. I thought the prior is a method for matching variant when the latter is a method for data extraction and annotation. They did not seem contradictory to me. I have limited knowledge in coding and that is probably why I did not know that their functions are linked.
As I mentioned before, I also tried the op "self" and still nothing was annotated. Since the annotation file like this:
REF | ALT | QUAL | FILTER | INFO
G | <CN2> | . | PASS | AC=64;AF=0.0127795;AFR_AF=0.0015;AMR_AF=0;AN=5008;CIEND=-150,150;CIPOS=-150,150;CS=DUP_delly;EAS_AF=0.0595;END=850204;EUR_AF=0.001;IMPRECISE;NS=2504;SAS_AF=0.001;SITEPOST=1;SVTYPE=DUP
C | <CN0>,<CN2> | . | PASS | AC=3,206;AF=0.00059904,0.0411342;AFR_AF=0,0.0303;AMR_AF=0.0014,0.0259;AN=5008;CS=DUP_gs;EAS_AF=0.001,0.0615;END=755966;EUR_AF=0.001,0.0417;NS=2504;SAS_AF=0,0.045;SITEPOST=1;SVTYPE=CNV
For a query variant that overlap with both variants, I was hoping to get an annotation like:
SVTYPE=DUP | CNV; AF=0.0127795 | 0.00059904, 0.0411342
I am trying to annotate structural variants, copy-number variants specifically in this case. REF is not important, so I guess I need to use permissive-overlap for sure. I just want to know how to annotate my query here?
If vcfanno does not have this function yet, can I request this feature? a feature that allows annotations for multiple ALT and multiple annotations? I am sorry that I do not have an idea as to how though.
Additionally, I have another feature request. After running vcfanno, the report shows how many variants are annotated:
vcfanno.go:241: annotated 6667 variants in 2.56 seconds (2607.5 / second)
But it is not very helpful when the wrong parameters are set and nothing is annotated. Can you let the report include how many annotations are actually made?
with permissive overlap, just use ops=["concat"] or uniq.
it's not feasible given the current architecture to report how many variants were modified.
Thank you very much. That worked perfectly!
Understood. Okay.
Hi @brentp,
I am having problem annotating my query with a vcf from 1000 genomes. No annotation have been added when there should be many overlapping variants. There was no error message pertain to this problem.
The command was:
./vcfanno_linux64 conf_new.toml NA12878.LUMPY_sorted_filtered.vcf.gz > annotated_test.vcf
The conf was:
The query and annotation files are attached here file.zip
The full error message was: