Closed ghost closed 6 years ago
Hi @kmartin285 - thanks for raising the issue. Definitely avoid using older versions at this point. Can you please provide the output of taxit --version
for the newer version that you are using?
Looks like there are two issues here (in the second example above - afraid I can't help troubleshoot the first error):
taxonomy.db
is missing the table "ranks" (perhaps you built it using the older version of taxtastic?)taxit taxtable -t
should be space (not comma) delimited.Give this a try (prints the taxtable to stdout):
virtualenv py2-env
source py2-env/bin/activate
pip install taxtastic==0.8.3
taxit -v new_database taxonomy.db
taxit taxtable taxonomy.db -t 47770 33945
Thanks for this feedback - the next release will include a more informative error message and help text for the -t
option.
I tried like you said, but I get a error after I use taxit -v new_database taxonomy.db It goes like this (py2-env)kmartin@n78622:~/Desktop$ taxit -v new_database taxonomy.db INFO ncbi 507 downloading ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip to /usr/users/ga002/kmartin/Desktop/taxdmp.zip INFO ncbi 230 Clobbering database tables INFO ncbi 233 Creating database tables INFO ncbi 365 loading ranks INFO ncbi 373 loading nodes INFO ncbi 379 loading names INFO ncbi 385 loading merged INFO ncbi 393 retrieving species names INFO ncbi 408 found 1309139 species names INFO ncbi 410 checking species names INFO ncbi 416 419725 names are classified (32.1%) INFO ncbi 421 CREATE TEMPORARY TABLE "jZFBhqhkmlFF" (tax_id text) INFO ncbi 424 inserting tax_ids into temporary table INFO ncbi 427 INSERT INTO "jZFBhqhkmlFF" VALUES (?) INFO ncbi 430 creating an index on the temporary table INFO ncbi 433 CREATE INDEX ix_jZFBhqhkmlFF_tax_id on "jZFBhqhkmlFF"(tax_id) INFO ncbi 436 updating names.is_classified INFO ncbi 445 UPDATE names SET is_classified = ? WHERE is_primary AND tax_id IN (SELECT tax_id FROM "jZFBhqhkmlFF")
INFO ncbi 471 marking invalid nodes INFO ncbi 472 WITH RECURSIVE descendants AS ( SELECT tax_id, parent_id, rank FROM nodes WHERE rank = 'species' AND tax_id not in (SELECT tax_id FROM names WHERE is_classified) UNION SELECT n.tax_id, n.parent_id, n.rank FROM nodes n INNER JOIN descendants d ON d.tax_id = n.parent_id ) UPDATE nodes SET is_valid = ? WHERE tax_id in (SELECT tax_id from descendants)
Traceback (most recent call last):
File "/usr/local/bin/taxit", line 22, in
Looks like your system has an older version of sqlite3 (< 3.8.3) that does not support common table expressions - please see the readme (https://github.com/fhcrc/taxtastic) for an explanation, and for instructions on how to install pysqlite2
in your virtualenv to provide a newer version of the sqlite3 libraries. Let me know how it goes. Also, it would be useful to know your OS version and the output of
python -c 'import sqlite3; print sqlite3.sqlite_version'
Thanks I will update. python -c 'import sqlite3; print sqlite3.sqlite_version' My version is 3.7.6.3
I got taxtastic to work on the test data. Thanks for all your help. One more thing please, when I put in my data with the command I get this error. I have 2471 species names in a file called strain.txt
(taxtastic-env)km285@bioinftop12-03:~/Desktop/taxtastic$ taxit taxtable taxonomy.db -f strain.txt -o strain_out.csv
Traceback (most recent call last):
File "/home/km285/Desktop/taxtastic/taxtastic-env/bin/taxit", line 9, in
Does taxit take taxa names anymore? In the documentary it mentions only taxa_ids for input using -f.
taxit taxtable
only takes taxids in the most recent version. You can look up names from taxids using ./taxit.py taxids
, although the output format leaves something to be desired:
% ./taxit.py taxids taxonomy.db -n 'Staphylococcus aureus,Streptococcus mitis,foo'
foo not found
1280 # Staphylococcus aureus
28037 # Streptococcus mitis
A quick and dirty conversion to a format that can be used as input to taxit taxtable --tax-id-file
::
% ./taxit.py taxids taxonomy.db -n 'Staphylococcus aureus,Streptococcus mitis,foo' | grep '#' | cut -f1 -d'#'
1280
28037
Note that if you provide a genus name (or other rank higher than species), taxit taxids
will list all children at the species label.
Sorry this isn't more convenient - I mostly manage lists of organisms using taxid and haven't needed this feature for some time.
In the master branch (install with pip install -U 'git+https://github.com/fhcrc/taxtastic.git@v0.8.4rc3#egg=taxtastic'
). I added a subcommand that is probably more useful than taxit taxids
(which I will probably remove in subsequent versions):
% ./taxit.py namelookup taxonomy.db --include-unmatched -n 'Staphylococcus aureus,Streptococcus mitis,Propionibacterium acnes,foo'
input,tax_name,tax_id,rank
Staphylococcus aureus,Staphylococcus aureus,1280,species
Streptococcus mitis,Streptococcus mitis,28037,species
Propionibacterium acnes,Cutibacterium acnes,1747,species
foo,,,
found 3 of 4 names
The output file can be provided to taxit taxtable
using -i/--seq-info
(although you will want to omit --include-unmatched
)
Thanks very much. I will give it a go.
Hi another question please. In the output taxit file, in the older version the order was organised. In that the species then genus ... up to root. And also the parent ID lead up to taxa ID. In the new version is there a way to make the output ordered?
Sorry I am talking about the first two columns of taxa id and parent id
The new order is by lineage - I'm afraid I'm not planning to provide options for alternative orderings, but there are a number of tools for manipulating tabular data that should be able to reorder the table for you: for example, check out https://csvkit.readthedocs.io/en/1.0.2/ and https://github.com/BurntSushi/xsv
Sounds like your original question is resolved - please feel free to repoen or open a new issue if necessary.
Hello,
I am trying to use taxtastic. But I keep getting this error. I am using an up to date taxonomy database
This is the error:
Traceback (most recent call last): File "/usr/local/bin/taxit", line 4, in
import('pkg_resources').run_script('taxtastic==0.5.3', 'taxit')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 646, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 1567, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/EGG-INFO/scripts/taxit", line 22, in
sys.exit(main(sys.argv[1:]))
File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/scripts/taxit.py", line 48, in main
return action(arguments)
File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/subcommands/taxtable.py", line 129, in action
tax.write_table(taxids_to_export, csvfile = args.out_file)
File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/taxonomy.py", line 304, in write_table
for lin in sorted(lineages, key=lambda x: (ranks.index(x['rank']), x['tax_name'])):
File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/taxonomy.py", line 304, in
for lin in sorted(lineages, key=lambda x: (ranks.index(x['rank']), x['tax_name'])):
ValueError: u'cohort' is not in list
I have found the error gets triggered on Acromyrmex echinatior. There are other names in my list that triggers it but I managed to identify this one. I also tried putting in Acromyrmex echinatior taxa id but that also sets it off.
I also tried your new version but that results in another different error. (taxtastic-env)testb@testb-VirtualBox:~/taxit$ taxit taxtable -t 47770,33945 -o ~/minimal_taxonomy.csv taxonomy.db Traceback (most recent call last): File "/home/testb/taxtastic-env/bin/taxit", line 22, in
sys.exit(main(sys.argv[1:]))
File "/home/testb/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/scripts/taxit.py", line 49, in main
return action(arguments)
File "/home/testb/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/subcommands/taxtable.py", line 154, in action
tax = Taxonomy(engine, schema=args.schema)
File "/home/testb/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/taxonomy.py", line 83, in init
ranks_table = self.meta.tables[self.prepend_schema('ranks')]
KeyError: 'ranks'
Any idea as to what be the cause ? Any help on this will be greatly appreciated.
Thanks