Closed dhoogest closed 4 years ago
@crosenth this solves the problem for me, but may create other issues?
https://github.com/fhcrc/taxtastic/commit/fb3ca61caa0930d7e4b8c0b0e51c35cbb55ab247
It might. What is the best way to reproduce this problem?
For instance with the above block in tmp.txt
(attached), run
taxit -v add_nodes {url} tmp.yml
The node will be added with None in the is_valid column (given the function defaults). Adding the is_valid=is_valid,
line to the appended statements properly reflects the is_valid value from the yml block for the node.
Dan, without any updates to the code what happens if you set is_valid: true
in the tmp.txt yaml file?
For me:
is_valid: true
sqlite3 ncbi_taxonomy.db 'select * from nodes where tax_id = "UW1127"'
|--------+-----------+---------+-----------+-------------+-----------+----------|
| tax_id | parent_id | rank | embl_code | division_id | source_id | is_valid |
|--------+-----------+---------+-----------+-------------+-----------+----------|
| UW1127 | 655183 | species | | | 2 | 1 |
|--------+-----------+---------+-----------+-------------+-----------+----------|
is_valid: false
sqlite3 ncbi_taxonomy.db 'select * from nodes where tax_id = "UW1127"'
|--------+-----------+---------+-----------+-------------+-----------+----------|
| tax_id | parent_id | rank | embl_code | division_id | source_id | is_valid |
|--------+-----------+---------+-----------+-------------+-----------+----------|
| UW1127 | 655183 | species | | | 2 | 0 |
|--------+-----------+---------+-----------+-------------+-----------+----------|
For me (no code updates), with either a sqlite or postgres db target and either is_valid=true
or is_valid=false
, the column isn't populated (remains Null).
Excerpt from the taxit log output:
% taxit -vvv add_nodes sqlite:///ncbi_taxonomy.db tmp.txt --source-name uw
...
INFO log 110 BEGIN (implicit)
2020-08-12 09:55:55,744 INFO sqlalchemy.engine.base.Engine INSERT INTO nodes (tax_id, parent_id, rank, source_id) VALUES (?, ?, ?, ?)
INFO log 110 INSERT INTO nodes (tax_id, parent_id, rank, source_id) VALUES (?, ?, ?, ?)
2020-08-12 09:55:55,744 INFO sqlalchemy.engine.base.Engine ('UW1127', '655183', 'species', 2)
INFO log 110 ('UW1127', '655183', 'species', 2)
2020-08-12 09:55:55,749 INFO sqlalchemy.engine.base.Engine UPDATE names SET is_primary=? WHERE names.tax_id = ?
INFO log 110 UPDATE names SET is_primary=? WHERE names.tax_id = ?
2020-08-12 09:55:55,749 INFO sqlalchemy.engine.base.Engine (0, 'UW1127')
INFO log 110 (0, 'UW1127')
2020-08-12 09:55:55,782 INFO sqlalchemy.engine.base.Engine INSERT INTO names (tax_id, tax_name, name_class, source_id, is_primary, is_classified) VALUES (?, ?, ?, ?, ?, ?)
INFO log 110 INSERT INTO names (tax_id, tax_name, name_class, source_id, is_primary, is_classified) VALUES (?, ?, ?, ?, ?, ?)
2020-08-12 09:55:55,782 INFO sqlalchemy.engine.base.Engine ('UW1127', 'Lactobacillus rhamnosus or Lactobacillus casei', 'synonym', 2, 1, 1)
INFO log 110 ('UW1127', 'Lactobacillus rhamnosus or Lactobacillus casei', 'synonym', 2, 1, 1)
2020-08-12 09:55:55,797 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 110 COMMIT
So the INSERT query on nodes isn't passing is_valid
. This is with:
% taxit --version
taxit v0.9.0
With the change proposed in fb3ca61:
INFO log 110 (2,)
2020-08-12 10:07:06,225 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
INFO log 110 BEGIN (implicit)
2020-08-12 10:07:06,225 INFO sqlalchemy.engine.base.Engine INSERT INTO nodes (tax_id, parent_id, rank, source_id, is_valid) VALUES (?, ?, ?, ?, ?)
INFO log 110 INSERT INTO nodes (tax_id, parent_id, rank, source_id, is_valid) VALUES (?, ?, ?, ?, ?)
2020-08-12 10:07:06,225 INFO sqlalchemy.engine.base.Engine ('UW1127', '655183', 'species', 2, 0)
INFO log 110 ('UW1127', '655183', 'species', 2, 0)
2020-08-12 10:07:06,228 INFO sqlalchemy.engine.base.Engine UPDATE names SET is_primary=? WHERE names.tax_id = ?
INFO log 110 UPDATE names SET is_primary=? WHERE names.tax_id = ?
2020-08-12 10:07:06,229 INFO sqlalchemy.engine.base.Engine (0, 'UW1127')
INFO log 110 (0, 'UW1127')
2020-08-12 10:07:06,230 INFO sqlalchemy.engine.base.Engine INSERT INTO names (tax_id, tax_name, name_class, source_id, is_primary, is_classified) VALUES (?, ?, ?, ?, ?, ?)
INFO log 110 INSERT INTO names (tax_id, tax_name, name_class, source_id, is_primary, is_classified) VALUES (?, ?, ?, ?, ?, ?)
2020-08-12 10:07:06,230 INFO sqlalchemy.engine.base.Engine ('UW1127', 'Lactobacillus rhamnosus or Lactobacillus casei', 'synonym', 2, 1, 1)
INFO log 110 ('UW1127', 'Lactobacillus rhamnosus or Lactobacillus casei', 'synonym', 2, 1, 1)
2020-08-12 10:07:06,233 INFO sqlalchemy.engine.base.Engine COMMIT
INFO log 110 COMMIT
So, is_valid
included in the INSERT. The true/false value from the yaml file is reflected properly in the database (tested in both sqlite and postgres)
Dan you are right, I was going thru the update logic in the code which already has your proposed update. Please create a PR so we can merge
For instance the following yaml fails to upsert node
UW1127
: