etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
771 stars 216 forks source link

Error for partially gap codons in translate function of EvolTree #455

Open sharifas opened 4 years ago

sharifas commented 4 years ago

I am using the EvolTree to do dn/ds calcultions. I have a iphylip file of coding sequences and I am getting the following error: "in translate for nt2 in newcod[1]: IndexError: list index out of range " I checked the 'ete/ete3/evol/utils.py' file and I see that the gencode dictionary handles full gaps "---" and converts them to "-" for the protein sequence but I don't think it handles partial gaps such as "A--"

dengzq1234 commented 4 years ago

Hi, could you provide an example file for testing?

fransua commented 4 years ago

Hi, you are probably right. This part is mean to be used with codon-based alignments, somthing like this http://etetoolkit.org/cookbook/ete_build_mixed_types.ipynb

However we should fix ETE to account for these cases.

thanks for reporting!

KittyMurphy commented 4 years ago

Hi,

I also have this problem when running ete3 evol using fasta files (primate genomes).

Command: ete3 evol -t $ANALYSIS_FOLDER/Genes/$gene/species_tree.nw --alg $ANALYSIS_FOLDER/Genes/$gene/$gene-clean.fa -o $ANALYSIS_FOLDER/Genes/$gene/$gene-evol-branch --models ${models[]} --tests ${tests[]} --cpu 6 --mark ${marks[*]} >> $ANALYSIS_FOLDER/Genes/$gene/$gene.ete-branch.log

Error log file: ** Running ete evol for...IGLJ3 ** Traceback (most recent call last): File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/lib/python3.6/site-packages/ete3-3.1.1-py3.6.egg/ete3/evol/utils.py", line 159, in translate proteinseq += gencode[sequence[n:n+3]] KeyError: 'G'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/bin/ete3", line 11, in load_entry_point('ete3==3.1.1', 'console_scripts', 'ete3')() File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/lib/python3.6/site-packages/ete3-3.1.1-py3.6.egg/ete3/tools/ete.py", line 95, in main _main(sys.argv) File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/lib/python3.6/site-packages/ete3-3.1.1-py3.6.egg/ete3/tools/ete.py", line 268, in _main args.func(args) File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/lib/python3.6/site-packages/ete3-3.1.1-py3.6.egg/ete3/tools/ete_evol.py", line 881, in run tree.link_to_alignment(args.alg, alg_format='paml') File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/lib/python3.6/site-packages/ete3-3.1.1-py3.6.egg/ete3/evol/evoltree.py", line 311, in link_to_alignment leaf.sequence = translate(leaf.nt_sequence) File "/rds/general/user/cm1118/home/anaconda3/envs/ete3_env/lib/python3.6/site-packages/ete3-3.1.1-py3.6.egg/ete3/evol/utils.py", line 169, in translate for nt2 in newcod[1]: IndexError: list index out of range Done with...IGLJ3

Example sequences from IGLJ3 fasta

dasNov3 CTGAGTAGACCCAGCCTGGG-CAGGGGCTTATACTTCCTCCATCACAGCTGCAGTGGGGG-AGG-GGCAGGGGCATCACAGGGAGGGTTTTTGTACGAGCCTGAATCACTGTGTTGGGTGTTCGGTGGAGGGACCCAGCTGACCGTCCTAG eulFla1 ---------------------------------CTTCCTCCAGCACAGCTGCAGCTGGGGCTGGAGCTG--GGGGTCTCGGGGAGGGTTTTTGTACGAGCCTGTGTCACTGTGTTGGGTGTTCGGCGGCGGGACCAAGCTGACCGTCCTAG eulMac1 ---------------------------------CTTCCTCCAGCACAGCTGCAGCTGGGGCTGGAGCTG--GGGGTCTCGGGGAGGGTTTTTGTACGAGCCTGTGTCACTGTGTTGGGTGTTCGGCGGCGGGACCAAGCTGACCGTCCTAG gorGor5 ATGAGCAGATGCCACCAGGGCCACTGGCCCCAGCTTCCTCCTTCACAGCTGCAGTGGGGGCTGGGGCTAGGGGCATCCCAGGGAGGGTTTTTGTATGAGCCTGTGTCACAGTGTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG

For now I have decided to leave out the genes outputting this error from my analyses, hoping to re-run once the problem is fixed.

Thank you, Kitty