Amine-Namouchi / snpToolkit

GNU General Public License v3.0
4 stars 0 forks source link

Annotate genbank file error #4

Closed adamjonandrews closed 3 years ago

adamjonandrews commented 3 years ago

Dear Amine, congrats on the tool, but I'm getting an error when using annotate.

I am told: "Something went wrong when trying to extract data from your Genbank file: 'gene'"

To combat this I edited the Genbank file (downloaded directly from NCBI containing the nucleotide seq), so that the first lines are removed, and this way it is accepted by snptoolkit. But then I get errors to say no SNPs were detected which I know not to be the case following visualization.

I'd be grateful if you can point me in the right direction. Cheers Adam

The edited genebank file that runs is:

FEATURES Location/Qualifiers source 1..16527 /organism="Thunnus thynnus" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8237" D-loop 1..865 tRNA 866..933 /product="tRNA-Phe" rRNA 934..1880 /product="12S ribosomal RNA" tRNA 1881..1953 /product="tRNA-Val" rRNA 1954..3645 /product="16S ribosomal RNA" tRNA 3646..3719 /product="tRNA-Leu" CDS 3720..4694 /codon_start=1 /transl_table=2 /product="NADH dehydrogenase subunit 1" /protein_id="YP_003587395.1" /translation="MITALMTHILNPLAFIVPVLLAVAFLTLIERKVLGYMQLRKGPN IVGPYGLLQPIADGVKLFIKEPVRPSTSSPVLFLLAPMLALTLALTLWAPMPLPYPVT DLNLGILFILALSSLAVYSILGSGWASNSKYALIGALRAVAQTISYEVSLGLILLNAI IFTGGFTLQTFNIAQEAIWLIIPAWPLAAMWYISTLAETNRAPFDLTEGESELVSGFN VEYAGGPFALFFLAEYANILLMNTLSATLFLGASHIPTIPELTATNLMIKAALLSMVF LWVRASYPRFRYDQLMHLIWKNFLPLTLALVIWHLALPIAFAGLPPQL" tRNA 4699..4769 /product="tRNA-Ile" tRNA complement(4769..4839) /product="tRNA-Gln" tRNA 4839..4907 /product="tRNA-Met" CDS 4908..5953 /note="TAA stop codon is completed by the addition of 3' A residues to the mRNA" /codon_start=1 /transl_except=(pos:5952..5953,aa:TERM) /transl_table=2 /product="NADH dehydrogenase subunit 2" /protein_id="YP_003587396.1" /translation="MNPYILATLLFGLGLGTTITFASSHWLLAWMGLEMNTLAIIPLM AQNHHPRAVEATTKYFLTQATAAAMLLFASTTNAWLTGQWNIEQMTHPIPTTMIMLAL ALKIGLAPVHAWLPEVLQGLDLTTGLILSTWQKLAPFALILQIHPTNPTLLIVLGVAS TLVGGWGGLNQTQLRKILAYSSIAHLGWMILILQFSPSLTLLTLLTYFIMTFSAFLVF KLNKATNINTLATSWAKAPALTSLTPLVLLSLGGLPPLTGFMPKWLILQELSKQDLAP VATLAALSALLSLYFYLRLSYAMTLTMSPNNLSGTASWRLPSLQSTLPVATSLVATLA LLPLTPAITAILTL" tRNA 5954..6023 /product="tRNA-Trp" tRNA complement(6025..6093) /product="tRNA-Ala" tRNA complement(6096..6168) /product="tRNA-Asn" rep_origin 6170..6203 /note="replication origin of L strand" tRNA complement(6204..6271) /product="tRNA-Cys" tRNA complement(6272..6339) /product="tRNA-Tyr" CDS 6341..7891 /codon_start=1 /transl_table=2 /product="cytochrome c oxidase subunit I" /protein_id="YP_003587397.1" /translation="MAITRWFFSTNHKDIGTLYLVFGAWAGMVGTALSLLIRAELSQP GALLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLIPLMIGAPDMAFPRMNNMSF WLLPPSFLLLLASSGVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILG AINFITTIINMKPAAISQYQTPLFVWAVLITAVLLLLSLPVLAAGITMLLTDRNLNTT FFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVAYYSGKKEPFGYMGMVWA MMAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGAVKW ETPLLWAIGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIVAA FVHWFPLFTGYTLHSTWTKIHFGVMFVGVNLTFFPQHFLGLAGMPRRYSDYPDAYTLW NTISSIGSLISLVAVIMFLFIIWEAFAAKREVMSVELTSTNIEWLHGCPPPYHTFEEP AFVLVQSD" tRNA complement(7892..7962) /product="tRNA-Ser" tRNA 7966..8038 /product="tRNA-Asp" CDS 8047..8737 /note="TAA stop codon is completed by the addition of 3' A residues to the mRNA" /codon_start=1 /transl_except=(pos:8737,aa:TERM) /transl_table=2 /product="cytochrome c oxidase subunit II" /protein_id="YP_003587398.1" /translation="MAHPSQLGFQDAASPVMEELLHFHDHALMIVFLISTLVLYIIVA MVSTKLTNKYILDSQEIEIIWTILPAIILILIALPSLRILYLMDEINDPHLTIKAVGH QWYWSYEYTDYEDLGFDSYMIPTQDLAPGQFRLLEADHRMVIPVESPIRILISADDVL HSWAVPSLGVKMDAVPGRLNQTAFIASRPGVFYGQCSEICGANHSFMPIVVEAVPLEH FENWSSLMLEDA" tRNA 8738..8811 /product="tRNA-Lys" CDS 8813..8980 /codon_start=1 /transl_table=2 /product="ATP synthase F0 subunit 8" /protein_id="YP_003587399.1" /translation="MPQLNPAPWLAILVFSWLVFLIIIPPKVMAHSFPNEPTPQSTEK PKGEPWNWPWH" CDS 8971..9654 /codon_start=1 /transl_table=2 /product="ATP synthase F0 subunit 6" /protein_id="YP_003587400.1" /translation="MTLSFFDQFMSPVFLGIPLMALALTLPWVLFPTPTSRWLNNRLL TLQNWFIGRFAHELFMPVNLPGHKWAVLLTSLMLFLISLNMLGLLPYTFTPTTQLSLN MGLAFPLWLATVIIGMRNQPTEALGHLLPEGTPTLLIPVLIVIETISLFIRPLALGVR LTANLTAGHLLIQLIATAATVLLPLMPTVAILTATLLFLLTLLEVAVAMIQAYVFVLL LSLYLQENV" CDS 9654..10438 /note="TAA stop codon is completed by the addition of 3' A residues to the mRNA" /codon_start=1 /transl_except=(pos:10437..10438,aa:TERM) /transl_table=2 /product="cytochrome c oxidase subunit III" /protein_id="YP_003587401.1" /translation="MAHQAHAYHMVDPSPWPLTGAVAALLMTSGLAIWFHFHSTTLMT VGTALLLLTMYQWWRDIIREGTYQGHHTPPVQKGLRYGMILFITSEVFFFLGFFWAFY HSSLAPTPELGGCWPPTGLTTLDPFEVPLLNTAVLLASGVTVTWAHHSIMEGNRKEAI QSLALTILLGFYFTFLQAMEYYEAPFTIADGVYGSTFFVATGFHGLHVIIGSTFLAVC LLRQIRYHFTSDHHFGFEAAAWYWHFVDVVWLFLYVSIYWWGS" tRNA 10439..10509 /product="tRNA-Gly" CDS 10510..10858 /note="TAA stop codon is completed by the addition of 3' A residues to the mRNA" /codon_start=1 /transl_except=(pos:10858,aa:TERM) /transl_table=2 /product="NADH dehydrogenase subunit 3" /protein_id="YP_003587402.1" /translation="MSLITTIITIAAALSAVLAVVSFWLPQMTPDHEKLSPYECGFDP LGSARLPFSLRFFLVAILFLLFDLEIALLLPLPWGDQLPSPLSTFLWASTVLVLLTLG LIYEWLQGGLEWAE" tRNA 10859..10927 /product="tRNA-Arg" CDS 10929..11225 /codon_start=1 /transl_table=2 /product="NADH dehydrogenase subunit 4L" /protein_id="YP_003587403.1" /translation="MTPVHFAFSTTFMLGLTGLAFHRTHLLSALLCLEAMMLSLFIAL SIWTLQLDSTNFSASPMLLLAFSACEASAGLALLVATSRTHGSDRLQSLNLLQC" CDS 11219..12599 /note="TAA stop codon is completed by the addition of 3' A residues to the mRNA" /codon_start=1 /transl_except=(pos:12599,aa:TERM) /transl_table=2 /product="NADH dehydrogenase subunit 4" /protein_id="YP_003587404.1" /translation="MLKILIPTLMLVPTTWLTPPKWLWPTTLAHSLIIALASLTWLES LSETGWTSLNLYMATDPLSTPLLVLTCWLLPLMILASQNHTAQEPINRQRMYITLLTS LQFFLILAFSATEIIMFYIMFEATLIPTLVIITRWGNQTERLNAGTYFLFYTLAGSLP LLVALLLLQNSTGTLSLLTLQYAPPLQLMSYADKLWWAGCLLAFLVKMPLYGVHLWLP KAHVEAPIAGSMVLAAVLLKLGGYGMMRMMIMLEPLTKELSYPFIVFALWGVIMTGSI CLRQTDLKSLIAYSSVSHMGLVAGGILIQTPWGFTGALILMIAHGLTSSALFCLANTN YERTHSRTMVLARGLQMVLPLMTTWWFIASLANLALPPLPNLMGEIMILTSLFNWSHW TLALTGAGTLITAGYSLYMFLMTQRGPLPAHIIALDPSHSREHLLIALHLLPLILLIL KPELIWGWTA" tRNA 12600..12669 /product="tRNA-His" tRNA 12670..12737 /product="tRNA-Ser" tRNA 12742..12814 /product="tRNA-Leu" CDS 12815..14653 /codon_start=1 /transl_table=2 /product="NADH dehydrogenase subunit 5" /protein_id="YP_003587405.1" /translation="MHPTSLMMTTSLIIIFSLLAYPVFTTLSPRPQAPDWALTQVKTA VKLAFFVSLLPLFLFMNEGAEAIITNWTWMNTLTFDINISLKFDHYSIIFTPIALYVT WSILEFASWYMHADPFMNRFFKYLLVFLIAMIILVTANNMFQLFIGWEGVGIMSFLLI GWWYGRADANTAALQAVVYNRVGDIGLILAMAWMATNLNSWEMQQMFVTAKNFDLTLP LLGLIVAATGKSAQFGLHPWLPSAMEGPTPVSALLHSSTMVVAGIFLLVRMSPLMENN QTALTLCLCLGALTTLFTATCALTQNDIKKIVAFSTSSQLGLMMVTIGLNQPQLAFLH ICTHAFFKAMLFLCSGSIIHSLNDEQDIRKMGGMHRLTPFTSSCLTLGSLALTGTPFL AGFFSKDAIIEALNTSHLNAWALTLTLIATSFTAIYSLRVVFFVSMGHPRFNSLSPIN ENNPAVINPIKRLAWGSIIAGLLITSNITPLKTPVMSMPPLLKLAALAVTILGLIIAL ELASLTSKQFKPTPMLTTHHFSNMLGFFPHIIHRFTPKLNLVLGQAIASQMVDQTWLE KSGPKALATSNLPLITTTSNAQQGMIKTYLALFLLTLTFATLLISY" CDS complement(14650..15171) /codon_start=1 /transl_table=2 /product="NADH dehydrogenase subunit 6" /protein_id="YP_003587406.1" /translation="MTYMMCLLLFGLVLGLVAVASNPSPYFAALGLVVVAGMGCGVLV GHGGSFLSLVLFLIYLGGMLVVFAYSAALAAEPYPETWGSPAVVLYMVIYGVGVILAC TALWGGWYEVSWVPADDMEEFAVFRGDVGGVALMYSLGGGMLVISAWVLLLTLFVVLE LTRGLSRGTVRAV" tRNA complement(15173..15241) /product="tRNA-Glu" CDS 15246..16386 /note="TAA stop codon is completed by the addition of 3' A residues to the mRNA" /codon_start=1 /transl_except=(pos:16386,aa:TERM) /transl_table=2 /product="cytochrome b" /protein_id="YP_003587407.1" /translation="MASLRKTHPLLKIANDALVDLPTPSNISAWWNFGSLLGLCLISQ ILTGLFLAMHYTPDVESAFASVAHICRDVNFGWLIRNLHANGASFFFICIYFHIGRGL YYGSYLYKETWNIGVVLLLLVMMTAFVGYVLPWGQMSFWGATVITNLLSAVPYVGTTL VEWIWGGFSVDNATLTRFFAFHFLFPFVIAAMTILHLLFLHETGSNNPIGLNSNADKI SFHPYFSYKDLLGFVILLVALASLALFSPNLLGDPDNFTPANPMVTPPHIKPEWYFLF AYAILRSIPNKLGGVLALLASILVLMVVPFLHTSKQRTLTFRPVSQFLFWTLIADVAI LTWIGGMPAEQPFIIIGQVASVLYFSLFLVFFPLAGWAENKILGWS" tRNA 16387..16458 /product="tRNA-Thr" tRNA complement(16458..16527) /product="tRNA-Pro" ORIGIN
1 ttccaccgtg cgcgcatatt tgattatgtc tgcgcatgta catatatgta atttcaccat 61 attcatatat agaccatata taataatgtt ttaggacata tatgtattaa aaccattact 121 agtactaaac cattcatatg tcaataaata atgaagattt acataaacca tacaaataaa 181 cctcaacatt cattttgaat tcaagcgatt gaacgagatt taagacctaa cataaaccta 241 aatcgtctaa gccataccaa gtctcctcat ctctgacatc tcgtaaactt aagcgcagta 301 agagcctacc atccagtcca tttcttaatg catacggtta ttgaaggtga gggacaatga 361 ttgtgggggt aacacttagt gaattattcc tggcatttgg ttcctacttc agggccctag 421 cctggtaaca ttccccattc tttcatcgac gcttgcataa gttattggtg gagtacatga 481 gattcattaa gccacatgcc gggcgttctc tctagggggt caggttattt ttttctctcc 541 ttcctttcac ttgacatctc acagtgcaaa tgcaacaatg atcaacaagg tagaacattt 601 tcttgcttgc agggtaaata gtctgcatgg cttaattcct attacctaaa taaccacata 661 agaggatatc acgagcataa tgataatatt acccgtaaaa tatctaagac accccctctc 721 ggcttttgcg cgttaaaccc ccctaccccc ctaaactcgt gatatcatta acactcctgt 781 aaaccccccg taaacaggaa aatctcgagt ggggtatttt atggcccaaa acgtatctat

etc.

The header I removed is:

LOCUS NC_014052 16527 bp DNA circular VRT 10-JUN-2016 DEFINITION Thunnus thynnus mitochondrion, complete genome. ACCESSION NC_014052 VERSION NC_014052.1 DBLINK BioProject: PRJNA48037 KEYWORDS RefSeq. SOURCE mitochondrion Thunnus thynnus (Atlantic bluefin tuna) ORGANISM Thunnus thynnus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Neoteleostei; Acanthomorphata; Pelagiaria; Scombriformes; Scombridae; Thunnus. REFERENCE 1 (bases 1 to 16527) AUTHORS Martinez Ibarra,C., Ishizaki,S. and Nagashima,Y. TITLE Development of a PCR-based method for differentiation of tuna and related species in canned products JOURNAL Unpublished REFERENCE 2 (bases 1 to 16527) CONSRTM NCBI Genome Project TITLE Direct Submission JOURNAL Submitted (23-APR-2010) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA REFERENCE 3 (bases 1 to 16527) AUTHORS Martinez Ibarra,C., Ishizaki,S. and Nagashima,Y. TITLE Direct Submission JOURNAL Submitted (02-DEC-2009) Food Science and Technology, Tokyo University of Marine Science and Technology, 5-7 Konan, Minato 4, Tokyo 108-8477, Japan COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence is identical to GU256522. COMPLETENESS: full length.

Amine-Namouchi commented 3 years ago

Hi Adam, The line with the LOCUS must be present so that snptoolkit is able to extract annotation information from the gb file. you dont need to remove anything from the original gb file. The error message: "Something went wrong when trying to extract data from your Genbank file: 'gene'" means that the gene tag is missing from your genbank. luckily it is a mitochondria annotation file so possible to handle. As a quick fix (that needs some work from you) you have to add the gene tag as follows for each CDS block: before CDS 3720..4694 /codon_start=1 /transl_table=2 /product="NADH dehydrogenase subunit 1" /protein_id="YP_003587395.1" /translation="MITALMTHILNPLAFIVPVLLAVAFLTLIERKVLGYMQLRKGPN IVGPYGLLQPIADGVKLFIKEPVRPSTSSPVLFLLAPMLALTLALTLWAPMPLPYPVT DLNLGILFILALSSLAVYSILGSGWASNSKYALIGALRAVAQTISYEVSLGLILLNAI IFTGGFTLQTFNIAQEAIWLIIPAWPLAAMWYISTLAETNRAPFDLTEGESELVSGFN VEYAGGPFALFFLAEYANILLMNTLSATLFLGASHIPTIPELTATNLMIKAALLSMVF LWVRASYPRFRYDQLMHLIWKNFLPLTLALVIWHLALPIAFAGLPPQL"

After CDS 3720..4694 /gene="YP_003587395.1" /codon_start=1 /transl_table=2 /product="NADH dehydrogenase subunit 1" /protein_id="YP_003587395.1" /translation="MITALMTHILNPLAFIVPVLLAVAFLTLIERKVLGYMQLRKGPN IVGPYGLLQPIADGVKLFIKEPVRPSTSSPVLFLLAPMLALTLALTLWAPMPLPYPVT DLNLGILFILALSSLAVYSILGSGWASNSKYALIGALRAVAQTISYEVSLGLILLNAI IFTGGFTLQTFNIAQEAIWLIIPAWPLAAMWYISTLAETNRAPFDLTEGESELVSGFN VEYAGGPFALFFLAEYANILLMNTLSATLFLGASHIPTIPELTATNLMIKAALLSMVF LWVRASYPRFRYDQLMHLIWKNFLPLTLALVIWHLALPIAFAGLPPQL"

I have added here /gene="YP_003587395.1

you have to do it for all CDS blocks by adding the /gene=corresponding id

lets me know if it resolve you issue

Amine

adamjonandrews commented 3 years ago

Dear Amine, thanks for getting back so quickly.

That's excellent and has resolved the error.

Thanks again, Adam

Amine-Namouchi commented 3 years ago

Great! glad to hear that