brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

Pos(1-based) syntax error when annotation dbNSFP data #93

Closed settys02 closed 6 years ago

settys02 commented 6 years ago

Hi, love the tool, thanks for your continued support on it!

I'm trying to annotate a set of variants with data from dbNSFP, but when I do I keep getting the following error: panic: strconv.Atoi: parsing "pos(1-based)": invalid syntax

The original dbNSFP file I was using had pos(1-based) as the column name for the position, but I got rid of it and I still get the same error. I'll attach my conf file, and samples of the files with this post.

orig_dbNSFP is a 100 line subset of the original file, modified dbNSFP is a 100 line subset of the file with my modification (got rid of the top columns names, standardized the delimiting), and conf_dbNSFP is the toml file I used (just renamed to .txt to upload)

modified_dbNSFP.txt orig_dbNSFP.txt conf_dbNSFP.txt

brentp commented 6 years ago

if the error is panic: strconv.Atoi: parsing "pos(1-based)": invalid syntax then you are still using a file with "pos(1-based)" somewhere in the file.

remember you need to update your conf file with the path to the new file that you made without the header.

weber8thomas commented 5 years ago

Hi, @settys02 did you succeed to use your modified dbNSFP file ? I tried to remove the header, reindex the complete file and build a new config toml file with it but when I want to use it to annonate gnomAD VCF extract, I already have the same issue :

vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
vcfanno.go:115: found 32 sources from 1 files
vcfanno.go:138: using 4 worker threads to decompress query file
panic: strconv.Atoi: parsing "pos(1-coor)": invalid syntax

goroutine 1815 [running]:
github.com/brentp/irelate.newMerger(0x7df158, 0x0, 0xc009588820, 0x2, 0x2, 0xc000463728)
    /home/brentp/go/src/github.com/brentp/irelate/irelate.go:237 +0x43e
github.com/brentp/irelate.IRelate(0x7df150, 0x0, 0x7df158, 0xc009588820, 0x2, 0x2, 0xc005a0a008, 0xc005b76090)
    /home/brentp/go/src/github.com/brentp/irelate/irelate.go:143 +0x66
github.com/brentp/irelate.PIRelate.func3.1(0xc0006b8000, 0x7df120, 0xc000628720, 0xc009588820, 0x2, 0x2)
    /home/brentp/go/src/github.com/brentp/irelate/parallel.go:245 +0x7b
created by github.com/brentp/irelate.PIRelate.func3
    /home/brentp/go/src/github.com/brentp/irelate/parallel.go:242 +0x10f

So, my command line is : ./vcfanno_linux64 -p 4 conf.toml gnomad_chr2_0-05.vcf.gz | bgzip > gnomad_chr2_0-05_vcfanno.vcf.gz

The config file is like that :

[[annotation]]
file="/gstock/biolo_datasets/variation/benchmark/dbNSFP/v3.5/LIGHT/WITHOUT_HEADER/dbNSFP4.0b1a_variant.full_light_without_header.gz"
names=["SIFT_score", "SIFT_pred", "SIFT4G_score", "SIFT4G_pred", "Polyphen2_HDIV_score", "Polyphen2_HDIV_pred", "Polyphen2_HVAR_score", "Polyphen2_HVAR_pred", "LRT_score", "LRT_pred", "MutationTaster_score", "MutationTaster_pred", "MutationAssessor_score", "MutationAssessor_pred", "FATHMM_score", "FATHMM_pred", "PROVEAN_score", "PROVEAN_pred", "VEST4_score", "M-CAP_score", "M-CAP_pred", "REVEL_score", "MutPred_score", "MVP_score", "MPC_score", "PrimateAI_score", "PrimateAI_pred", "DEOGEN2_score", "DEOGEN2_pred", "CADD_raw", "CADD_phred", "DANN_score"]
ops=["first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first", "first"]
columns=[31, 33, 34, 36, 37, 39, 40, 42, 43, 45, 47, 49, 52, 54, 55, 57, 58, 60, 61, 70, 72, 73, 75, 80, 82, 84, 86, 87, 89, 96, 98, 99]

And there is no header in my dbNSFP source file, it starts directly at the second line.

I'm pretty desperate and really need to annotate my files.

I think at the beginning that when a position in the VCF file is not present in my source file for annotation, it crash but it's not already the cases for all my VCF files that I have from gnomAD :/

Thanks in advance for your response !

brentp commented 5 years ago

And there is no header in my dbNSFP source file, it starts directly at the second line.

then how does vcfanno know the header is "pos(1-coor)" ?

which you can see from the error:

panic: strconv.Atoi: parsing "pos(1-coor)": invalid syntax

you must be pointing the conf file to a file with a header.

weber8thomas commented 5 years ago

Thank you for quick response !!!

Sorry I found my mistake, you were write, I merge different files which were splitted by CHR and remove the header only for the first of them.

However, thanks for your answer, we are currently moving from VEP with all his super slow plugins to VCFANNO which is super fast to annotate big VCF files :)

Thanks for the support !