WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
218 stars 332 forks source link

Error: unable to find start site from input line <Binary file hg38_grasp2.txt matches> #140

Open Shicheng-Guo opened 3 years ago

Shicheng-Guo commented 3 years ago

Dear Prof. Wang,

I meet this problem when I use index_annovar.pl: Error: unable to find start site from input line . I checked the input file format with awk and find everything is okay. any suggestions?

Thanks. Shicheng

(base) [sguo2@login01 grasp2]$ perl ~/janssen4/bin/annovar/index_annovar.pl hg38_grasp1.txt -outfile ~/janssen4/bin/annovar/humandb/hg38_grasp1.txt
NOTICE: the bin size is set as 1000 (use -bin to change this)
NOTICE: Two output files will be generated for use by ANNOVAR: /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp1.txt and /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp1.txt.idx (use -outfile to override)
NOTICE: Running the first step of indexing (generating /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp1.txt) ...
NOTICE: Running the second step of indexing (generating /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp1.txt.idx) ...
**Error: unable to find start site from input line <Binary file hg38_grasp1.txt matches>**
(base) [sguo2@login01 grasp2]$ perl ~/janssen4/bin/annovar/index_annovar.pl hg38_grasp2.txt -outfile ~/janssen4/bin/annovar/humandb/hg38_grasp2.txt
NOTICE: the bin size is set as 1000 (use -bin to change this)
NOTICE: Two output files will be generated for use by ANNOVAR: /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp2.txt and /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp2.txt.idx (use -outfile to override)
NOTICE: Running the first step of indexing (generating /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp2.txt) ...
NOTICE: Running the second step of indexing (generating /home/sguo2/janssen4/bin/annovar/humandb/hg38_grasp2.txt.idx) ...
**Error: unable to find start site from input line <Binary file hg38_grasp2.txt matches>**
(

Here is format for the input

(base) [sguo2@login01 grasp2]$ head hg38_grasp1.txt
chr1    872132  872133  T=Systolic blood pressure SBP-Health and aging CVD and cancer age of onset,P=5.40E-21,PUB=22174011,A=European,L=Intron
chr1    872132  872133  T=Total cholesterol-Health and aging CVD and cancer age of onset,P=7.70E-09,PUB=22174011,A=European,L=Intron
chr1    872132  872133  T=Cardiovascular disease prevalence-Health and aging CVD and cancer age of onset,P=2.20E-13,PUB=22174011,A=European,L=Intron
chr1    1260310 1260311 T=Serum ratio of 1-methyluratesalicylate-Metabolites in serum,P=2.00E-08,PUB=21886157,A=European,L=Intron
chr1    1312114 1312115 T=Inflammatory bowel disease-Crohns disease and ulcerative colitis,P=7.66E-13,PUB=23128233,A=European,L=Synonymous
chr1    1759611 1759612 T=Serum ratio of 2-methylbutyroylcarnitinekynurenine-Metabolites in serum,P=1.20E-08,PUB=21886157,A=European,L=Intron
chr1    2137733 2137734 T=Height-Height,P=2.10E-08,PUB=20881960,A=European,L=Intron
chr1    2137733 2137734 T=Height-Lung function,P=2.10E-08,PUB=21946350,A=European,L=Intron
chr1    2137733 2137734 T=Height-Primary tooth eruption,P=2.10E-08,PUB=23704328,A=European,L=Intron
chr1    2461209 2461210 T=Non-obstructive azoospermia-Non-obstructive azoospermia,P=5.65E-12,PUB=22197933,A=Asian,L=-
Shicheng-Guo commented 3 years ago

interesting solution:

perl grasp2annovar.pl > hg38_grasp.txt
cp hg38_grasp.txt hg38_grasp2.txt
perl -p -i -e '{s/g.s/gs/g}' hg38_grasp2.txt
perl -p -i -e '{s/h.e/he/g}' hg38_grasp2.txt
perl -p -i -e '{s/\<//g}' hg38_grasp2.txt
perl -p -i -e '{s/\>//g}' hg38_grasp2.txt
perl -p -i -e '{s/=10E4.l-Liver/=10E4ul-Liver/g}' hg38_grasp2.txt
perl -p -i -e '{s/Fuchs.s/Fuchss/g}' hg38_grasp2.txt
grep -a l-Liver hg38_grasp2.txt
grep -a Fuchs hg38_grasp2.txt
perl ~/janssen4/bin/annovar/index_annovar.pl hg38_grasp2.txt -bin 1000 -outfile ~/janssen4/bin/annovar/humandb/hg38_grasp.txt