WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

Special characters in *_refGene.txt causing invaild record error #106

Closed xin-huang closed 3 years ago

xin-huang commented 4 years ago

Hi Prof. Wang,

Recently, I used ANNOVAR (20191024) to annotate drosophila genomes.

I generated my database according to https://annovar.openbioinformatics.org/en/latest/user-guide/gene/#create-your-own-gene-definition-databases-for-non-human-species

Here are the commands I used to create the database:

annotate_variation.pl -downdb -buildver dm3 gene drosdb
annotate_variation.pl --buildver dm3 --downdb seq drosdb/dm3_seq
retrieve_seq_from_fasta.pl drosdb/dm3_refGene.txt -seqdir drosdb/dm3_seq -format refGene -outfile drosdb/dm3_refGeneMrna.fa

Then I annotated my data with the above database, and got invalid record error

Error: invalid record found in exonic_variant_function file (exonic format error): <line294     nonsynonymous SNV       l(2)gl:NM_001258873:exon8:c.A3352C:p.K1118Q,l(2)gl:NM_078715:exon8:c.A3352C:p.K1118Q,l(2)gl:NM_164348:exon8:c.A3328C:p.K1110Q,l(2)gl:NM_164350:exon8:c.A3205C:p.K1069Q,l(2)gl:NM_164351:exon8:c.A3205C:p.K1069Q,l(2)gl:NM_001258872:exon9:c.A3352C:p.K1118Q,l(2)gl:NM_001258874:exon9:c.A3352C:p.K1118Q,l(2)gl:NM_164349:exon9:c.A3352C:p.K1118Q,l(2)gl:NM_164352:exon9:c.A3205C:p.K1069Q,l(2)gl:NM_001258875:exon10:c.A3352C:p.K1118Q     chr2L    11414   11414   T       G> at coding_change.pl line 77, <EVF> line 63.
Error running system command: <coding_change.pl  dros.biallelic.snps.dm3.refGene.exonic_variant_function.orig annovar/drosdb//dm3_refGene.txt annovar/drosdb//dm3_refGeneMrna.fa -alltranscript -out dros.biallelic.snps.dm3.refGene.fa -newevf dros.biallelic.snps.dm3.refGene.exonic_variant_function>

After removing special characters (, ), ', :, in dm3_refGene.txt, I can successfully run ANNOVAR.

I think the error is caused by the gene names with special characters, e.g. l(2)gl. Could you please make ANNOVAR to accept these characters?

Thank you.

xin-huang commented 3 years ago

Seems like I cannot get any response so I close this issue.

kaichop commented 3 years ago

This can be a quick fix that I missed earlier. If you email me directly, I can send a modified script to you to test whether it works.

On Fri, Jan 1, 2021 at 11:57 PM Xin Huang notifications@github.com wrote:

Closed #106 https://github.com/WGLab/doc-ANNOVAR/issues/106.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/106#event-4161468716, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OCCV3NPGDW2SFJHKZ3SX2RTJANCNFSM4Q454IJA .

kaichop commented 3 years ago

The new coding_change.pl can be used to process dm3 successfully. Attached here: coding_change.txt

xin-huang commented 3 years ago

Hi Prof. Wang,

I tested the new codes and it works. But I modified the regular expression in line 77, 79 and 617, so that it can process gene names with : and ', e.g. His1:CG33843 and beta'COP

coding_change.txt

lxy0107 commented 2 years ago

Thanks for sharing, I'm also working with Drosophila and this script is really useful to me. On top of your edits, I also needed to add \[\] in the regular expressions to deal with genes like su(w[a]).

qibaiqi commented 11 months ago

Thanks for sharing, I'm also working with Drosophila and this script is really useful to me. On top of your edits, I also needed to add \[\] in the regular expressions to deal with genes like su(w[a]).

@lxy0107
Hi, verry sorry to bother. I also encountered the same problem, I need to process Drosophila data as well. If possible, could you please share the script? Appreciate very much. Thank you!

kaichop commented 11 months ago

The coding_change.pl in ANNOVAR should be able to process these issues already. But in any case, here is the script (change txt to pl) coding_change.txt

qibaiqi commented 11 months ago

The coding_change.pl in ANNOVAR should be able to process these issues already. But in any case, here is the script (change txt to pl) coding_change.txt

Thank you!!🌹🌹