fhcrc / taxtastic

Create and maintain phylogenetic "reference packages" of biological sequences.
GNU General Public License v3.0
21 stars 10 forks source link

is_classified should operate on species name only instead of each tax_name #59

Closed nhoffman closed 11 years ago

nhoffman commented 12 years ago

Records are currently categorized as "classified" or not based on whether ncbi.UNCLASSIFIED_REGEX matches the tax_name as the "names" table is prepared to be inserted into the database. It seems to have been a mistake to perform this operation here instead of later when the lineage can be defined for each tax_id. We should really be matching on the entire species name as opposed to the first two words in names at each rank. One approach would be to simply create an empty is_classified column here and fill it in later.