Closed wdecoster closed 6 years ago
file 0 is your file so there must be something odd about it. can you tabix it? otherwise, yeah, if you could email it, that'd be great.
Thanks for the quick reply. I can tabix it. I'll send you an email with the file.
I checked if sorting the vcf with either vcf-sort
or bcftools sort
would make a difference - and both produce the error. It does change the number of "chromosome not found" warnings that are printed to the terminal though.
you could also zgrep for 4294967295 or 4294967296
Negative on both. I don't think those are real coordinates too, looks like -1
...
I have SVs starting on position 0 of a chromosome (chrEBV and chrM).
here's an example of a line that's causing a problem:
chr2 132008747 43638 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.9;CHR2=chr17;END=21290407;STD_quant_start=0.408248;STD_quant_stop=1.080123;Kurtosis_quant_start=1.763276;Kurtosis_quant_stop=-0.901645;SVTYPE=TRA;SUPTYPE=SR;SVLEN=2131120800;STRANDS=--;RE=25;REF_strand=10,4;AF=0.641026 GT:DR:DV 0/1:14:25
is has a SVLEN of 2.1 billion. That gets added to start and gives an overflow. This is incorrect VCF format as far as I understand. Even if vcfanno
preferred the END over SVLEN (it prefers SVLEN over END currently), it would still give the wrong result (though not an error) as it's pointing to the END on another chromosome.
I'm closing this as there's no reasonable way that vcfanno can know about this type of problem.
Right, makes sense. I'll raise this issue with the developer of Sniffles. Thanks!
Hey guys, what should be put in there for the SVLen? Or just not report it? Thanks Fritz
1
SVLen =1 ? Are you sure? thx Fritz
SVLEN=1 or if there's a chunk of DNA of 120 bases being moved then SVLEN=120
ok i will put that in. thx Fritz
Hi there,
I am getting the following error:
panic: intervals out of order within file: starts at: 4294967295 and 2943 from source: 0
I have removed any variants larger than 100k, so I hope that is not causing the issue. There are quite a few structural variants though (203,521). I can also tabix. I was wondering if you had any suggestions for how to debug. This is also a VCF file coming from SURVIVOR merge
so there is a common link, Here is my config file:
[[annotation]]
file="anno.genes.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_genes"]
type="String"
[[annotation]]
file="anno.exons.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_exons"]
type="String"
[[annotation]]
file="anno.5utr.2000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_5utr_2k"]
type="String"
[[annotation]]
file="anno.5utr.5000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_5utr_5k"]
type="String"
[[annotation]]
file="anno.3utr.2000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_3utr_2k"]
type="String"
[[annotation]]
file="anno.3utr.5000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_3utr_5k"]
type="String"
[[annotation]]
file="anno.meristem.atac.srt.bed.gz"
columns=[7]
ops = ["uniq"]
names = ["meristem_ATAC_open_chromatin"]
type="String"
And here is the full output:
vcfanno version 0.3.1 [built with go1.11]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 7 sources from 7 files
panic: intervals out of order within file: starts at: 4294967295 and 2943 from source: 0
goroutine 1081 [running]:
github.com/brentp/irelate.(*merger).Next(0xc0001151f0, 0x6, 0x400, 0x2, 0xc01618c000)
/home/brentp/go/src/github.com/brentp/irelate/irelate.go:278 +0xba4
github.com/brentp/irelate.(*irelate).Next(0xc0192a4c80, 0x190, 0x190, 0xc00366f300, 0x190)
/home/brentp/go/src/github.com/brentp/irelate/irelate.go:159 +0xc1
github.com/brentp/irelate.PIRelate.func3.1(0xc0088b8540, 0x7df120, 0xc00000ecc0, 0xc023a65f80, 0x8, 0x8)
/home/brentp/go/src/github.com/brentp/irelate/parallel.go:250 +0xf5
created by github.com/brentp/irelate.PIRelate.func3
/home/brentp/go/src/github.com/brentp/irelate/parallel.go:242 +0x10f
vcfanno can only handle chromosomes of up to 4.2 billion. does your organism have chromosome longer than this?
if not, perhaps you have some software that's annotating in genome coordinates?
If you do have chromosomes longer than 4.2GB, you probably won't be able to use vcfanno.
The chromosomes are much smaller than your limit so that should not be a problem. May I ask what you mean by your second point?
Thanks
in the example above, there was a variant with SVLEN=2131120800
which gets added to the start and causes an overflow.
You have a variant in your query file that starts at 2943 and one near it that is causing the overflow.
Thanks @brentp,
I am looking at the file now for anything funky. Is it just SVLEN that could be causing this, or should I be looking for other problem signs as well?
SVLEN or END if SVLEN is not present.
Looks like I had a 0-based index problem. One of my variants started at 0
. I'll have to figure out why that happened.
Thanks!
I got the same message, that is why I landed on this issue page! The VCF
file I was working with only contained SNPs and indels. For me, using bcftools sort
fixed the problem.
Another observation is that even after sorting, the tabix
command failed.
Hi Brent,
I am annotating SVs using vcfanno and receive an error
panic: intervals out of order within file: starts at: 4294967295 and 0 from source: 0
(full error below). Given that 4294967295 = 2^32 - 1 this might be an overflow or something?The 4 annotation bed files I use were used before and did not throw an error then. You can find them here https://mega.nz/#!jNUHHQiC!zU8fxxFUu_0WR98eXYfJuDQhBLVhBTHFu-_C4nzB7Ok
My toml annotation file looks like (paths modified):
Command:
vcfanno vcfanno_conf.toml SVs.vcf > SVs_annotated.vcf
I cannot share my vcf here publicly but can email it to you if that helps.
Output: