fhcrc / taxtastic

Create and maintain phylogenetic "reference packages" of biological sequences.
GNU General Public License v3.0
21 stars 10 forks source link

Error building db, fixed using older versions from NCBI #146

Closed nickp60 closed 1 year ago

nickp60 commented 2 years ago

Hello, I am getting the following error after running this command to build a taxonomy database.

docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) nghoffman/taxtastic:0.8.3 taxit -vvv new_database taxonomy.db
docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) nghoffman/taxtastic:0.8.3 taxit -vvv new_database taxonomy.db
INFO ncbi 507 downloading ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip to /Users/nwaters/GitHub/conditioning/mitre/taxdmp.zip
INFO ncbi 230 Clobbering database tables
2021-10-08 20:33:33,019 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
INFO log 109 SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2021-10-08 20:33:33,019 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,019 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
INFO log 109 SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2021-10-08 20:33:33,020 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,021 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("ranks")
INFO log 109 PRAGMA table_info("ranks")
2021-10-08 20:33:33,021 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,025 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("source")
INFO log 109 PRAGMA table_info("source")
2021-10-08 20:33:33,025 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,027 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("nodes")
INFO log 109 PRAGMA table_info("nodes")
2021-10-08 20:33:33,027 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,028 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("names")
INFO log 109 PRAGMA table_info("names")
2021-10-08 20:33:33,028 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,028 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("merged")
INFO log 109 PRAGMA table_info("merged")
2021-10-08 20:33:33,028 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,029 INFO sqlalchemy.engine.base.Engine 
DROP TABLE source
INFO log 109 
DROP TABLE source
2021-10-08 20:33:33,029 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,054 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,055 INFO sqlalchemy.engine.base.Engine 
DROP TABLE ranks
INFO log 109 
DROP TABLE ranks
2021-10-08 20:33:33,055 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,089 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
INFO ncbi 233 Creating database tables
2021-10-08 20:33:33,092 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("ranks")
INFO log 109 PRAGMA table_info("ranks")
2021-10-08 20:33:33,092 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,096 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("source")
INFO log 109 PRAGMA table_info("source")
2021-10-08 20:33:33,098 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,100 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("nodes")
INFO log 109 PRAGMA table_info("nodes")
2021-10-08 20:33:33,100 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,100 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("names")
INFO log 109 PRAGMA table_info("names")
2021-10-08 20:33:33,100 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,101 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("merged")
INFO log 109 PRAGMA table_info("merged")
2021-10-08 20:33:33,101 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,102 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE ranks (
        rank VARCHAR NOT NULL, 
        height INTEGER NOT NULL, 
        PRIMARY KEY (rank), 
        UNIQUE (height)
)

INFO log 109 
CREATE TABLE ranks (
        rank VARCHAR NOT NULL, 
        height INTEGER NOT NULL, 
        PRIMARY KEY (rank), 
        UNIQUE (height)
)

2021-10-08 20:33:33,102 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,124 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,127 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE source (
        id INTEGER NOT NULL, 
        name VARCHAR, 
        description VARCHAR, 
        PRIMARY KEY (id), 
        UNIQUE (name)
)

INFO log 109 
CREATE TABLE source (
        id INTEGER NOT NULL, 
        name VARCHAR, 
        description VARCHAR, 
        PRIMARY KEY (id), 
        UNIQUE (name)
)

2021-10-08 20:33:33,127 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,167 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,170 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE nodes (
        tax_id VARCHAR NOT NULL, 
        parent_id VARCHAR, 
        rank VARCHAR, 
        embl_code VARCHAR, 
        division_id VARCHAR, 
        source_id INTEGER, 
        is_valid BOOLEAN, 
        PRIMARY KEY (tax_id), 
        FOREIGN KEY(rank) REFERENCES ranks (rank), 
        FOREIGN KEY(source_id) REFERENCES source (id), 
        CHECK (is_valid IN (0, 1))
)

INFO log 109 
CREATE TABLE nodes (
        tax_id VARCHAR NOT NULL, 
        parent_id VARCHAR, 
        rank VARCHAR, 
        embl_code VARCHAR, 
        division_id VARCHAR, 
        source_id INTEGER, 
        is_valid BOOLEAN, 
        PRIMARY KEY (tax_id), 
        FOREIGN KEY(rank) REFERENCES ranks (rank), 
        FOREIGN KEY(source_id) REFERENCES source (id), 
        CHECK (is_valid IN (0, 1))
)

2021-10-08 20:33:33,170 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,193 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,193 INFO sqlalchemy.engine.base.Engine CREATE INDEX ix_nodes_parent_id ON nodes (parent_id)
INFO log 109 CREATE INDEX ix_nodes_parent_id ON nodes (parent_id)
2021-10-08 20:33:33,193 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,221 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,222 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE names (
        tax_id VARCHAR NOT NULL, 
        tax_name VARCHAR NOT NULL, 
        unique_name VARCHAR, 
        name_class VARCHAR NOT NULL, 
        source_id INTEGER, 
        is_primary BOOLEAN, 
        is_classified BOOLEAN, 
        PRIMARY KEY (tax_id, tax_name, name_class), 
        FOREIGN KEY(tax_id) REFERENCES nodes (tax_id) ON DELETE CASCADE, 
        FOREIGN KEY(source_id) REFERENCES source (id), 
        CHECK (is_primary IN (0, 1)), 
        CHECK (is_classified IN (0, 1))
)

INFO log 109 
CREATE TABLE names (
        tax_id VARCHAR NOT NULL, 
        tax_name VARCHAR NOT NULL, 
        unique_name VARCHAR, 
        name_class VARCHAR NOT NULL, 
        source_id INTEGER, 
        is_primary BOOLEAN, 
        is_classified BOOLEAN, 
        PRIMARY KEY (tax_id, tax_name, name_class), 
        FOREIGN KEY(tax_id) REFERENCES nodes (tax_id) ON DELETE CASCADE, 
        FOREIGN KEY(source_id) REFERENCES source (id), 
        CHECK (is_primary IN (0, 1)), 
        CHECK (is_classified IN (0, 1))
)

2021-10-08 20:33:33,222 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,248 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,249 INFO sqlalchemy.engine.base.Engine CREATE INDEX ix_names_tax_id_is_primary ON names (tax_id, is_primary)
INFO log 109 CREATE INDEX ix_names_tax_id_is_primary ON names (tax_id, is_primary)
2021-10-08 20:33:33,249 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,264 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,266 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE merged (
        old_tax_id VARCHAR NOT NULL, 
        new_tax_id VARCHAR, 
        PRIMARY KEY (old_tax_id), 
        FOREIGN KEY(new_tax_id) REFERENCES nodes (tax_id) ON DELETE CASCADE
)

INFO log 109 
CREATE TABLE merged (
        old_tax_id VARCHAR NOT NULL, 
        new_tax_id VARCHAR, 
        PRIMARY KEY (old_tax_id), 
        FOREIGN KEY(new_tax_id) REFERENCES nodes (tax_id) ON DELETE CASCADE
)

2021-10-08 20:33:33,266 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,306 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
2021-10-08 20:33:33,307 INFO sqlalchemy.engine.base.Engine CREATE INDEX ix_merged_old_tax_id ON merged (old_tax_id)
INFO log 109 CREATE INDEX ix_merged_old_tax_id ON merged (old_tax_id)
2021-10-08 20:33:33,307 INFO sqlalchemy.engine.base.Engine ()
INFO log 109 ()
2021-10-08 20:33:33,344 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 109 COMMIT
INFO ncbi 365 loading ranks
INFO ncbi 373 loading nodes
INFO ncbi 379 loading names
Traceback (most recent call last):
  File "/usr/local/bin/taxit", line 22, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/usr/local/lib/python2.7/site-packages/taxtastic/scripts/taxit.py", line 49, in main
    return action(arguments)
  File "/usr/local/lib/python2.7/site-packages/taxtastic/subcommands/new_database.py", line 86, in action
    ncbi_loader.load_archive(zfile)
  File "/usr/local/lib/python2.7/site-packages/taxtastic/ncbi.py", line 382, in load_archive
    self.load_table('names', rows=names_rows)
  File "/usr/local/lib/python2.7/site-packages/taxtastic/ncbi.py", line 343, in load_table
    cur.executemany(cmd, itertools.islice(rows, limit))
sqlite3.IntegrityError: UNIQUE constraint failed: names.tax_id, names.tax_name, names.name_class

It looks like older versions work: I ended up trying the 2018-12-01 version arbitrarily, and that was fine:

sudo docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) nghoffman/taxtastic:0.8.3 taxit -vvv  new_database -u ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_archive/new_taxdump_2018-12-01.zip
nhoffman commented 1 year ago

Trying to clean up some issues - I suspect that you have either solved this or moved on, but hopefully this issue is fixed in v0.10.0 - if not feel free to reopen.