brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

error: index out of range. when FORMAT field is provided but no sample columns present #134

Open vindex10 opened 3 years ago

vindex10 commented 3 years ago

Hello! I've discovered an error in a somewhat specific usecase. Here is the error:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12889 NA12890 NA12877
panic: runtime error: index out of range

goroutine 13 [running]:
github.com/brentp/vcfgo.(*Reader).Parse(0xc00013d6c0, 0xc0002440e0, 0x9, 0x9, 0x4db24d)
        /home/brentp/go/src/github.com/brentp/vcfgo/reader.go:223 +0xae5
github.com/brentp/bix.(*Bix).toPosition(0xc00007daa0, 0xc0002440e0, 0x9, 0x9, 0x1, 0x0)
        /home/brentp/go/src/github.com/brentp/bix/bix.go:204 +0x77
github.com/brentp/bix.bixerator.Next(0x88cd40, 0xc00015e5d0, 0xc00007dc20, 0xc00007daa0, 0x88f900, 0xc00000d380, 0xf, 0xb346c0, 0x0, 0xc00005f620)
        /home/brentp/go/src/github.com/brentp/bix/bix.go:342 +0x11c
github.com/brentp/irelate.newMerger(0x81ecc8, 0x0, 0xc00000d340, 0x2, 0x2, 0x0)
        /home/brentp/go/src/github.com/brentp/irelate/irelate.go:235 +0x12e
github.com/brentp/irelate.IRelate(0x81ecc0, 0x0, 0x81ecc8, 0xc00000d340, 0x2, 0x2, 0x0, 0x0)
        /home/brentp/go/src/github.com/brentp/irelate/irelate.go:143 +0x5d
github.com/brentp/irelate.PIRelate.func3.1(0xc0002320c0, 0x81ec90, 0xc00015e510, 0xc00000d340, 0x2, 0x2)
        /home/brentp/go/src/github.com/brentp/irelate/parallel.go:245 +0x7b
created by github.com/brentp/irelate.PIRelate.func3
        /home/brentp/go/src/github.com/brentp/irelate/parallel.go:242 +0x10f

Please find files in the attachment.

example.tar.gz

When annotating from VCF, tt looks like VCFAnno expects for the annotator source at least one sample in the VCF if FORMAT field is provided.

Thank you!

brentp commented 3 years ago

hi, you can fix this by changing your exac.vcf.gz to not have the FORMAT field. If the FORMAT field is present, it expects to have sample information.

vindex10 commented 3 years ago

Do you think it could be weakened easily? Because according to the VCF standard there can be "arbitrary number of sample IDs", I read it as including 0 :) https://samtools.github.io/hts-specs/VCFv4.3.pdf#subsection.1.5

It is of course a cosmetic change, but would be nice to have, if it doesn't affect efficiency and straightforward to implement.

Thank you for the neat and fast VCFAnno :)

sigven commented 1 year ago

Hi @brentp,

A minor follow-up on this one. When I run vcfanno (v0.3.3) on a VCF with no FORMAT or sample genotypes (only the eight mandatory columns), the annotated VCF produced by vcfanno gives me the following header:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

I guess the FORMAT column should be skipped here?

best, Sigve

sigven commented 1 year ago

Just noticed https://github.com/brentp/vcfanno/issues/123, so this behaviour is already reported. Sorry for the duplication.

brentp commented 1 year ago

@sigven , I'll add this back into the queue and try to get a fix out.