etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
779 stars 214 forks source link

Add option to ignore updating NCBI taxonomy database #459

Open jrober84 opened 4 years ago

jrober84 commented 4 years ago

This situation applies in an HPC environment and there are multiple concurrent jobs running which are calling ETE3 independently and there has either been an update to the NCBI database or momentary connection interruption which prevents connection to the SQL lite NCBI taxonomy database. Essentially all of the processes then try to update the NCBI taxonomy database simultaneously which then causes them all to start failing. The problem is in the file ncbiquery.py.

    self.db = None
    self._connect()

    if not is_taxadb_up_to_date(self.dbfile):
        print('NCBI database format is outdated. Upgrading', file=sys.stderr)
        self.update_taxonomy_database(taxdump_file)

It would be great to have as an option to ignore updating the NCBI taxonomy and/or having the process create a lock file for updating the taxonomy database, so that multiple processes can't try to do it simultaneously.

kbessonov1984 commented 3 years ago

Is it possible to make database version check optional? Could an extra parameter be added to the NCBITaxa class (e.g. ignore_db_ver_check)?