fhcrc / taxtastic

Create and maintain phylogenetic "reference packages" of biological sequences.
GNU General Public License v3.0
21 stars 10 forks source link

update_taxids requires seq_info when just list of taxids would suffice #69

Closed tyleraland closed 8 years ago

tyleraland commented 9 years ago

What I want to do is to parse some BLAST output, which gives me one or many staxids per hit, and map each sequence to its species-level taxid

Here is my ideal workflow:

Unfortunately, rarely, BLAST will give me a taxid not in their taxonomy database (and thus not in the taxdb) so I need to filter it out.

As a consequence, I need to use update_taxids to first clean the taxids list. However, update_taxids requires that my taxids list be a seq_info csv mapping seqname columns to tax_id columns. This information doesn't really make sense in my case (a single seqname may map to multiple taxids, but taxtable doesn't need that information). So three extra steps are required:

As far as I can tell, the seq_name portion of the seq_info file is not used, but it is required. If update_taxids allowed a list of taxids it would simplify the interface.

nhoffman commented 8 years ago

@crosenth - is this still an issue, or has the behavior been updated since Tyler opened this?

crosenth commented 8 years ago

Yes:

positional arguments: infile Input CSV file to process, minimally containing the fields tax_id. Rows with missing tax_ids are left unchanged. ...