Prunoideae / MitoFlex

A mitogenome toolkit inspired by MitoZ, while being more effective, precise and flexible.
GNU General Public License v3.0
18 stars 5 forks source link

Error occurred while ETE3 updating NCBI taxonomy database #2

Closed SamLMG closed 4 years ago

SamLMG commented 4 years ago

Hello, I'm having the following issue with ete3 when running the ./ncbi.py script Do you have any ideas what the issue here might be? Thanks in advance Sam

(mitoflex) [leeming@l33 test1]$ ../ncbi.py Filesystem status: Total: 109.00 GB Free: 98.00 GB

If the free disk space is too low (<1G), database updating can be failed! Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)... Done. Parsing... Loading node names... 2269854 names loaded. 226194 synonyms loaded. Loading nodes... 2269854 nodes loaded. Linking nodes... Tree is loaded. Updating database: /home/lv71312/leeming/.etetoolkit/taxa.sqlite ... 2269000 generating entries... Uploading to /home/lv71312/leeming/.etetoolkit/taxa.sqlite

Inserting synonyms: 60000 Errors occured when fetching data from NCBI database, falling back to the last fetched database. Loading node names... 2269574 names loaded. 225837 synonyms loaded. Loading nodes... 2269574 nodes loaded. Linking nodes... Tree is loaded. Updating database: /home/lv71312/leeming/.etetoolkit/taxa.sqlite ... 2269000 generating entries... Uploading to /home/lv71312/leeming/.etetoolkit/taxa.sqlite Traceback (most recent call last): File "../ncbi.py", line 65, in ncbi.update_taxonomy_database() File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database update_db(self.dbfile) File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db upload_data(dbfile) File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)) sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "../ncbi.py", line 80, in ncbi = NCBITaxa(taxdump_file=os.path.abspath(dump_file)) File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 106, in init self.update_taxonomy_database(taxdump_file) File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 131, in update_taxonomy_database update_db(self.dbfile, taxdump_file) File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db upload_data(dbfile) File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 791, in upload_data db.execute(cmd) sqlite3.OperationalError: database is locked

Prunoideae commented 4 years ago

Hello SamLMG, This error is maybe there's something malfunctioned in the taxonomy database, or an possible situation not covered in ete3's code. There's a error report and solution for this on the ETE toolkit's Google Group, also, this is already an official issue on ete3's repository [etetoolkit/ete#469]

If you need a fix, I would suggest you to do as follow:

  1. Goto the ncbiquery.py, which is located in /home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py as traceback shown.
  2. Edit the file at line 802, which should be db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  3. Change it to db.execute("INSERT OR REPLACE INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  4. Delete the sql database, which is located in ~/.etetoolkit/taxa.sqlite, and rerun the ncbi.py.

This maybe caused by a change of NCBI's taxonomy data, which unexpectedly generated same entries, and broke the code. The future plan to fix this is to update the ete3 once that issue is closed and fixed, I will have this issue open to that time.

Prunoideae commented 4 years ago

Or, you can use the already downloaded taxdump.tar.gz.

If so, please do as follow:

  1. Delete the sql database, path mentioned as above.
  2. Launch the python interpreter at MitoFlex's root directory.
  3. Enter these :
    from ete3 import NCBITaxa
    from os import path
    ncbi = NCBITaxa(taxdump_file=path.abspath('taxdump.tar.gz')

This method reused the old taxdump.tar.gz previously downloaded, and should not be affected by current NCBI's change. Though newly added taxonomy record will not present, this is enough for most the program's function.

SamLMG commented 4 years ago

Hello SamLMG, This error is maybe there's something malfunctioned in the taxonomy database, or an possible situation not covered in ete3's code. There's a error report and solution for this on the ETE toolkit's Google Group, also, this is already an official issue on ete3's repository [etetoolkit/ete#469]

If you need a fix, I would suggest you to do as follow:

  1. Goto the ncbiquery.py, which is located in /home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py as traceback shown.
  2. Edit the file at line 802, which should be db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  3. Change it to db.execute("INSERT OR REPLACE INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  4. Delete the sql database, which is located in ~/.etetoolkit/taxa.sqlite, and rerun the ncbi.py.

This maybe caused by a change of NCBI's taxonomy data, which unexpectedly generated same entries, and broke the code. The future plan to fix this is to update the ete3 once that issue is closed and fixed, I will have this issue open to that time.

Hi, Thanks for the fix. The database is now successfully updated

Prunoideae commented 4 years ago

Fixed by hacking into ETE3 module in ncbi.py, replacing the wrong database query method by the correct one.

The main part of MitoFlex did not have a patch like this to prevent instability occurring from inside, so running ncbi.py will be a necessary step in installation.

JanDrouaud commented 3 years ago

Hi all, I think this happens because of the "COLLATE NOCASE" statement used for creating the synonym table. Consequently taxid/synonym pairs that appear as identical because the case is not considered for comparison .. cause a SQLite insertion error.. So you can either delete this statement, or include the line : db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)) in a try / except block like this: try: db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)) except: print(i,taxid,spname) That way you really don't modify the expected ouput of ete3 and get track of the taxid/synonym pairs that were skipped. Jan