etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
768 stars 216 forks source link

Annotating leaves with NCBI taxonomy #706

Closed lm-jkominek closed 10 months ago

lm-jkominek commented 10 months ago

Hi, I have a tree of my lineages of interests which I got from ncbi.get_topology(taxid_list) but I've been struggling with getting it re-annotated with the actual taxonomic names (sing ETE4, for the record). I can get the taxid-to-names mappings from get_taxid_translator(taxid_list) or via ncbi.annotate_tree() just fine, that is not the problem - the issue is that I want the leaves to be renamed using that information, so they are no longer the NCBI TaxIDs, so that I can write a Newick tree that is meaningful. I know I can work around that by either traversing the tree and manual rename or just save as TaxIDs and the mapping and do it separately from ETE-toolkit, but since the information is already there, I was hoping there is an integrated way to do it. I saw that I could potentially export that through Nexus, and convert later, but that seems more convoluted than necessary... Any tips appreciated!

lm-jkominek commented 10 months ago

Nevermind...as per usual drill, after you struggle for hours, finally type your problem out and ask for help, the solution comes up half an hour later after some more digging in to the ETE4 code...

For the record, my solution was the following:

ncbi = NCBITaxa()
t1 = ncbi.get_topology(taxid_list)
all_taxids = []
for n in t1.traverse():
    all_taxids.append(n.name)
#This was needed in order to add the TaxIDs of internal nodes that 
#might not have been present in the original taxid_list

t2n = ncbi.get_taxid_translator(all_taxids)
t2 = PhyloTree(t1.write(), sp_naming_function=lambda name: name)
t2.annotate_ncbi_taxa(tax2name=t2n)
for n in t2.traverse():
    if n.name != None:
        n.name = n.props['sci_name']

t2.write(outfile="ncbi_tree.nwk", parser=8)
#alternatively, use parser=9 to write the taxonomic info of the internal nodes