brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

panic: intervals out of order within file: starts at: 4294967295 and 0 #95

Closed wdecoster closed 6 years ago

wdecoster commented 6 years ago

Hi Brent,

I am annotating SVs using vcfanno and receive an error panic: intervals out of order within file: starts at: 4294967295 and 0 from source: 0 (full error below). Given that 4294967295 = 2^32 - 1 this might be an overflow or something?

The 4 annotation bed files I use were used before and did not throw an error then. You can find them here https://mega.nz/#!jNUHHQiC!zU8fxxFUu_0WR98eXYfJuDQhBLVhBTHFu-_C4nzB7Ok

My toml annotation file looks like (paths modified):

[[annotation]]
file="GRCh38_full_annotation.bed.gz"
columns = [4]
ops = ["uniq"]
names = ["GENE"]

[[annotation]]
file="GRCh38_exons.bed.gz"
columns = [4]
ops = ["uniq"]
names = ["CODING"]

[[annotation]]
file="GRCh38_genomicSuperDups.bed.bgz"
columns = [4]
ops = ["uniq"]
names = ["SEGDUP"]

[[annotation]]
file="dgv_GRCh38.bed.bgz"
columns = [4]
ops = ["uniq"]
names = ["DGV"]

Command: vcfanno vcfanno_conf.toml SVs.vcf > SVs_annotated.vcf

I cannot share my vcf here publicly but can email it to you if that helps.

Output:

=============================================
vcfanno version 0.3.0 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 4 sources from 4 files
bix.go:221: chromosome chr14_GL000194v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr14_GL000225v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr14_KI270722v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr14_KI270725v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr14_KI270726v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr17_GL000205v2_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr17_KI270730v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr17_KI270730v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr17_KI270730v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr17_KI270729v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr17_KI270729v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr17_KI270729v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr1_KI270708v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr1_KI270708v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr1_KI270708v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr1_KI270709v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr1_KI270709v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr1_KI270709v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr1_KI270711v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr1_KI270712v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270732v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr22_KI270732v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr22_KI270732v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270734v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270733v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270736v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr22_KI270735v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr22_KI270736v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr22_KI270735v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr22_KI270736v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270735v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270738v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr22_KI270738v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr22_KI270737v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr22_KI270737v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr22_KI270738v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr22_KI270737v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr2_KI270715v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr2_KI270715v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr2_KI270715v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_genomicSuperDups.bed.bgz
bix.go:221: chromosome chr2_KI270715v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr5_GL000208v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr5_GL000208v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr5_GL000208v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr9_KI270717v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chr9_KI270717v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chr9_KI270717v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr9_KI270719v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chr9_KI270718v1_random not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chrEBV not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_full_annotation.bed.gz
bix.go:221: chromosome chrEBV not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_exons.bed.gz
bix.go:221: chromosome chrEBV not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/GRCh38_genomicSuperDups.bed.bgz
bix.go:221: chromosome chrEBV not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
bix.go:221: chromosome chrM not found in /home/wdecoster/databases/Homo_sapiens/GRCh38_recommended/dgv_GRCh38.bed.bgz
panic: intervals out of order within file: starts at: 4294967295 and 0 from source: 0

goroutine 41432 [running]:
github.com/brentp/irelate.(*merger).Next(0xc00014c000, 0x10000007938e0, 0x1900, 0x1900, 0xc003cba000)
    /home/brentp/go/src/github.com/brentp/irelate/irelate.go:278 +0xba4
github.com/brentp/irelate.(*irelate).Next(0xc0004143c0, 0x190, 0x190, 0xc003cba000, 0x190)
    /home/brentp/go/src/github.com/brentp/irelate/irelate.go:159 +0xc1
github.com/brentp/irelate.PIRelate.func3.1(0xc001341ce0, 0x7df100, 0xc0004160f0, 0xc000415e00, 0x5, 0x5)
    /home/brentp/go/src/github.com/brentp/irelate/parallel.go:250 +0xf5
created by github.com/brentp/irelate.PIRelate.func3
    /home/brentp/go/src/github.com/brentp/irelate/parallel.go:242 +0x10f
brentp commented 6 years ago

file 0 is your file so there must be something odd about it. can you tabix it? otherwise, yeah, if you could email it, that'd be great.

wdecoster commented 6 years ago

Thanks for the quick reply. I can tabix it. I'll send you an email with the file.

I checked if sorting the vcf with either vcf-sort or bcftools sort would make a difference - and both produce the error. It does change the number of "chromosome not found" warnings that are printed to the terminal though.

brentp commented 6 years ago

you could also zgrep for 4294967295 or 4294967296

wdecoster commented 6 years ago

Negative on both. I don't think those are real coordinates too, looks like -1... I have SVs starting on position 0 of a chromosome (chrEBV and chrM).

brentp commented 6 years ago

here's an example of a line that's causing a problem:

chr2    132008747   43638   N   <TRA>   .   PASS    PRECISE;SVMETHOD=Snifflesv1.0.9;CHR2=chr17;END=21290407;STD_quant_start=0.408248;STD_quant_stop=1.080123;Kurtosis_quant_start=1.763276;Kurtosis_quant_stop=-0.901645;SVTYPE=TRA;SUPTYPE=SR;SVLEN=2131120800;STRANDS=--;RE=25;REF_strand=10,4;AF=0.641026    GT:DR:DV    0/1:14:25

is has a SVLEN of 2.1 billion. That gets added to start and gives an overflow. This is incorrect VCF format as far as I understand. Even if vcfanno preferred the END over SVLEN (it prefers SVLEN over END currently), it would still give the wrong result (though not an error) as it's pointing to the END on another chromosome.

I'm closing this as there's no reasonable way that vcfanno can know about this type of problem.

wdecoster commented 6 years ago

Right, makes sense. I'll raise this issue with the developer of Sniffles. Thanks!

fritzsedlazeck commented 5 years ago

Hey guys, what should be put in there for the SVLen? Or just not report it? Thanks Fritz

brentp commented 5 years ago

1

fritzsedlazeck commented 5 years ago

SVLen =1 ? Are you sure? thx Fritz

brentp commented 5 years ago

SVLEN=1 or if there's a chunk of DNA of 120 bases being moved then SVLEN=120

fritzsedlazeck commented 5 years ago

ok i will put that in. thx Fritz

malonge commented 5 years ago

Hi there,

I am getting the following error:

panic: intervals out of order within file: starts at: 4294967295 and 2943 from source: 0

I have removed any variants larger than 100k, so I hope that is not causing the issue. There are quite a few structural variants though (203,521). I can also tabix. I was wondering if you had any suggestions for how to debug. This is also a VCF file coming from SURVIVOR merge so there is a common link, Here is my config file:

[[annotation]]
file="anno.genes.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_genes"]
type="String"

[[annotation]]
file="anno.exons.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_exons"]
type="String"

[[annotation]]
file="anno.5utr.2000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_5utr_2k"]
type="String"

[[annotation]]
file="anno.5utr.5000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_5utr_5k"]
type="String"

[[annotation]]
file="anno.3utr.2000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_3utr_2k"]
type="String"

[[annotation]]
file="anno.3utr.5000.srt.bed.gz"
columns=[4]
ops = ["concat"]
names = ["ITAG4_3utr_5k"]
type="String"

[[annotation]]
file="anno.meristem.atac.srt.bed.gz"
columns=[7]
ops = ["uniq"]
names = ["meristem_ATAC_open_chromatin"]
type="String"

And here is the full output:

vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 7 sources from 7 files
panic: intervals out of order within file: starts at: 4294967295 and 2943 from source: 0

goroutine 1081 [running]:
github.com/brentp/irelate.(*merger).Next(0xc0001151f0, 0x6, 0x400, 0x2, 0xc01618c000)
    /home/brentp/go/src/github.com/brentp/irelate/irelate.go:278 +0xba4
github.com/brentp/irelate.(*irelate).Next(0xc0192a4c80, 0x190, 0x190, 0xc00366f300, 0x190)
    /home/brentp/go/src/github.com/brentp/irelate/irelate.go:159 +0xc1
github.com/brentp/irelate.PIRelate.func3.1(0xc0088b8540, 0x7df120, 0xc00000ecc0, 0xc023a65f80, 0x8, 0x8)
    /home/brentp/go/src/github.com/brentp/irelate/parallel.go:250 +0xf5
created by github.com/brentp/irelate.PIRelate.func3
    /home/brentp/go/src/github.com/brentp/irelate/parallel.go:242 +0x10f
brentp commented 5 years ago

vcfanno can only handle chromosomes of up to 4.2 billion. does your organism have chromosome longer than this?

if not, perhaps you have some software that's annotating in genome coordinates?

If you do have chromosomes longer than 4.2GB, you probably won't be able to use vcfanno.

malonge commented 5 years ago

The chromosomes are much smaller than your limit so that should not be a problem. May I ask what you mean by your second point?

Thanks

brentp commented 5 years ago

in the example above, there was a variant with SVLEN=2131120800 which gets added to the start and causes an overflow. You have a variant in your query file that starts at 2943 and one near it that is causing the overflow.

malonge commented 5 years ago

Thanks @brentp,

I am looking at the file now for anything funky. Is it just SVLEN that could be causing this, or should I be looking for other problem signs as well?

brentp commented 5 years ago

SVLEN or END if SVLEN is not present.

malonge commented 5 years ago

Looks like I had a 0-based index problem. One of my variants started at 0. I'll have to figure out why that happened.

Thanks!

eafyounian commented 4 years ago

I got the same message, that is why I landed on this issue page! The VCF file I was working with only contained SNPs and indels. For me, using bcftools sort fixed the problem.

Another observation is that even after sorting, the tabix command failed.