fhcrc / taxtastic

Create and maintain phylogenetic "reference packages" of biological sequences.
GNU General Public License v3.0
21 stars 10 forks source link

taxtastic cohort error #116

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello,

I am trying to use taxtastic. But I keep getting this error. I am using an up to date taxonomy database

This is the error:

Traceback (most recent call last): File "/usr/local/bin/taxit", line 4, in import('pkg_resources').run_script('taxtastic==0.5.3', 'taxit') File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 646, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 1567, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/EGG-INFO/scripts/taxit", line 22, in sys.exit(main(sys.argv[1:])) File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/scripts/taxit.py", line 48, in main return action(arguments) File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/subcommands/taxtable.py", line 129, in action tax.write_table(taxids_to_export, csvfile = args.out_file) File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/taxonomy.py", line 304, in write_table for lin in sorted(lineages, key=lambda x: (ranks.index(x['rank']), x['tax_name'])): File "/usr/local/lib/python2.7/dist-packages/taxtastic-0.5.3-py2.7.egg/taxtastic/taxonomy.py", line 304, in for lin in sorted(lineages, key=lambda x: (ranks.index(x['rank']), x['tax_name'])): ValueError: u'cohort' is not in list

I have found the error gets triggered on Acromyrmex echinatior. There are other names in my list that triggers it but I managed to identify this one. I also tried putting in Acromyrmex echinatior taxa id but that also sets it off.

I also tried your new version but that results in another different error. (taxtastic-env)testb@testb-VirtualBox:~/taxit$ taxit taxtable -t 47770,33945 -o ~/minimal_taxonomy.csv taxonomy.db Traceback (most recent call last): File "/home/testb/taxtastic-env/bin/taxit", line 22, in sys.exit(main(sys.argv[1:])) File "/home/testb/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/scripts/taxit.py", line 49, in main return action(arguments) File "/home/testb/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/subcommands/taxtable.py", line 154, in action tax = Taxonomy(engine, schema=args.schema) File "/home/testb/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/taxonomy.py", line 83, in init ranks_table = self.meta.tables[self.prepend_schema('ranks')] KeyError: 'ranks'

Any idea as to what be the cause ? Any help on this will be greatly appreciated.

Thanks

nhoffman commented 6 years ago

Hi @kmartin285 - thanks for raising the issue. Definitely avoid using older versions at this point. Can you please provide the output of taxit --version for the newer version that you are using?

nhoffman commented 6 years ago

Looks like there are two issues here (in the second example above - afraid I can't help troubleshoot the first error):

Give this a try (prints the taxtable to stdout):

virtualenv py2-env
source py2-env/bin/activate
pip install taxtastic==0.8.3 
taxit -v new_database taxonomy.db
taxit taxtable taxonomy.db -t 47770 33945

Thanks for this feedback - the next release will include a more informative error message and help text for the -t option.

ghost commented 6 years ago

I tried like you said, but I get a error after I use taxit -v new_database taxonomy.db It goes like this (py2-env)kmartin@n78622:~/Desktop$ taxit -v new_database taxonomy.db INFO ncbi 507 downloading ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip to /usr/users/ga002/kmartin/Desktop/taxdmp.zip INFO ncbi 230 Clobbering database tables INFO ncbi 233 Creating database tables INFO ncbi 365 loading ranks INFO ncbi 373 loading nodes INFO ncbi 379 loading names INFO ncbi 385 loading merged INFO ncbi 393 retrieving species names INFO ncbi 408 found 1309139 species names INFO ncbi 410 checking species names INFO ncbi 416 419725 names are classified (32.1%) INFO ncbi 421 CREATE TEMPORARY TABLE "jZFBhqhkmlFF" (tax_id text) INFO ncbi 424 inserting tax_ids into temporary table INFO ncbi 427 INSERT INTO "jZFBhqhkmlFF" VALUES (?) INFO ncbi 430 creating an index on the temporary table INFO ncbi 433 CREATE INDEX ix_jZFBhqhkmlFF_tax_id on "jZFBhqhkmlFF"(tax_id) INFO ncbi 436 updating names.is_classified INFO ncbi 445 UPDATE names SET is_classified = ? WHERE is_primary AND tax_id IN (SELECT tax_id FROM "jZFBhqhkmlFF")

INFO ncbi 471 marking invalid nodes INFO ncbi 472 WITH RECURSIVE descendants AS ( SELECT tax_id, parent_id, rank FROM nodes WHERE rank = 'species' AND tax_id not in (SELECT tax_id FROM names WHERE is_classified) UNION SELECT n.tax_id, n.parent_id, n.rank FROM nodes n INNER JOIN descendants d ON d.tax_id = n.parent_id ) UPDATE nodes SET is_valid = ? WHERE tax_id in (SELECT tax_id from descendants)

Traceback (most recent call last): File "/usr/local/bin/taxit", line 22, in sys.exit(main(sys.argv[1:])) File "/usr/local/lib/python2.7/dist-packages/taxtastic/scripts/taxit.py", line 49, in main return action(arguments) File "/usr/local/lib/python2.7/dist-packages/taxtastic/subcommands/new_database.py", line 88, in action ncbi_loader.set_nodes_is_valid() File "/usr/local/lib/python2.7/dist-packages/taxtastic/ncbi.py", line 474, in set_nodes_is_valid cur.execute(cmd, (False, )) sqlite3.OperationalError: near "WITH": syntax error

nhoffman commented 6 years ago

Looks like your system has an older version of sqlite3 (< 3.8.3) that does not support common table expressions - please see the readme (https://github.com/fhcrc/taxtastic) for an explanation, and for instructions on how to install pysqlite2 in your virtualenv to provide a newer version of the sqlite3 libraries. Let me know how it goes. Also, it would be useful to know your OS version and the output of

python -c 'import sqlite3; print sqlite3.sqlite_version'
ghost commented 6 years ago

Thanks I will update. python -c 'import sqlite3; print sqlite3.sqlite_version' My version is 3.7.6.3

ghost commented 6 years ago

I got taxtastic to work on the test data. Thanks for all your help. One more thing please, when I put in my data with the command I get this error. I have 2471 species names in a file called strain.txt

(taxtastic-env)km285@bioinftop12-03:~/Desktop/taxtastic$ taxit taxtable taxonomy.db -f strain.txt -o strain_out.csv

Traceback (most recent call last): File "/home/km285/Desktop/taxtastic/taxtastic-env/bin/taxit", line 9, in load_entry_point('taxtastic==0.8.4rc2-1.g27a52c8', 'console_scripts', 'taxit')() File "/home/km285/Desktop/taxtastic/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/scripts/taxit.py", line 51, in main return action(arguments) File "/home/km285/Desktop/taxtastic/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/subcommands/taxtable.py", line 156, in action rows = tax._get_lineage_table(tax_ids) File "/home/km285/Desktop/taxtastic/taxtastic-env/local/lib/python2.7/site-packages/taxtastic/taxonomy.py", line 328, in _get_lineage_table raise ValueError(msg) ValueError: 3447 tax_ids were provided but only 174 were returned

ghost commented 6 years ago

Does taxit take taxa names anymore? In the documentary it mentions only taxa_ids for input using -f.

nhoffman commented 6 years ago

taxit taxtable only takes taxids in the most recent version. You can look up names from taxids using ./taxit.py taxids, although the output format leaves something to be desired:

% ./taxit.py taxids taxonomy.db -n 'Staphylococcus aureus,Streptococcus mitis,foo'
foo not found
1280 # Staphylococcus aureus
28037 # Streptococcus mitis

A quick and dirty conversion to a format that can be used as input to taxit taxtable --tax-id-file::

% ./taxit.py taxids taxonomy.db -n 'Staphylococcus aureus,Streptococcus mitis,foo' | grep '#' | cut -f1 -d'#'
1280
28037

Note that if you provide a genus name (or other rank higher than species), taxit taxids will list all children at the species label.

Sorry this isn't more convenient - I mostly manage lists of organisms using taxid and haven't needed this feature for some time.

nhoffman commented 6 years ago

In the master branch (install with pip install -U 'git+https://github.com/fhcrc/taxtastic.git@v0.8.4rc3#egg=taxtastic'). I added a subcommand that is probably more useful than taxit taxids (which I will probably remove in subsequent versions):

% ./taxit.py namelookup taxonomy.db --include-unmatched -n 'Staphylococcus aureus,Streptococcus mitis,Propionibacterium acnes,foo'
input,tax_name,tax_id,rank
Staphylococcus aureus,Staphylococcus aureus,1280,species
Streptococcus mitis,Streptococcus mitis,28037,species
Propionibacterium acnes,Cutibacterium acnes,1747,species
foo,,,
found 3 of 4 names

The output file can be provided to taxit taxtable using -i/--seq-info (although you will want to omit --include-unmatched)

ghost commented 6 years ago

Thanks very much. I will give it a go.

ghost commented 6 years ago

Hi another question please. In the output taxit file, in the older version the order was organised. In that the species then genus ... up to root. And also the parent ID lead up to taxa ID. In the new version is there a way to make the output ordered?

ghost commented 6 years ago

Sorry I am talking about the first two columns of taxa id and parent id

nhoffman commented 6 years ago

The new order is by lineage - I'm afraid I'm not planning to provide options for alternative orderings, but there are a number of tools for manipulating tabular data that should be able to reorder the table for you: for example, check out https://csvkit.readthedocs.io/en/1.0.2/ and https://github.com/BurntSushi/xsv

Sounds like your original question is resolved - please feel free to repoen or open a new issue if necessary.