etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
791 stars 214 forks source link

Using older taxdump.tar.gz versions #340

Closed asulit08 closed 6 years ago

asulit08 commented 6 years ago

Good day! This has been a really helpful tool in python. I just wanted to ask if it would be possible to use an older version of the taxdump (stored locally, called by update_taxonomy_database). I used a classification tool using taxdump to build its database but some of the information on it wasn't complete so I'm supplementing by parsing through the database. I do however want to use the same taxdump I used to build the classification tool's database. I did try this with the update_taxonomy_database and I've got no errors, but I'm not sure it carried out properly. I've also seen this old document in issues: https://github.com/etetoolkit/ete/pull/225 wherein a create db option was added but as I'm new to the tool I was wondering how to be able to use it. Thank you

unode commented 6 years ago

@asulit08 You are going in the right direction.

To construct the database you need to call https://github.com/etetoolkit/ete/blob/master/ete3/ncbi_taxonomy/ncbiquery.py#L122 passing the location to the file you want to use taxdump="/path/to/taxdump.tar.gz". You can then use https://github.com/etetoolkit/ete/issues/295#issuecomment-317592954 to make use of the database you just created.

asulit08 commented 6 years ago

how does the ncbiquery.py work in creating the database?

what I originally did was I ran ncbi.update_taxonomy_database(taxdump_file='Tax_20180310/taxdump.tar.gz'), wherein taxdump_file points to the location of the taxdump.tar.gz I wanted to use. I am unsure if this is correct though and this automatically changes the sqlite database in the .etetoolkit folder

asulit08 commented 6 years ago

Oh, I am also using an ete3 installed through Conda

asulit08 commented 6 years ago

Oh, I think I got it. I wasn't sure I was using the correct versions of ete as I started with ete2 and installed ete3 (but the dependencies were unchanged) so I tried updating everything instead. I ran ncbi=NCBITaxa(dbfile="path to sqlite", taxdump_file="path to taxdump") and it's working properly I think. So when I need to use database, I'd need to just call the path where the sqlite I want to use is right? so that would be ncbi=NCBITaxa(dbfile="path to sqlite") before anything else?

unode commented 6 years ago

Yes.

I'm closing this as the issue is now solved. Cheers.