fhcrc / taxtastic

Create and maintain phylogenetic "reference packages" of biological sequences.
GNU General Public License v3.0
21 stars 10 forks source link

taxtable produes malformed column names #127

Closed bowmanjeffs closed 1 year ago

bowmanjeffs commented 5 years ago

When I recently updated to the latest version of taxtastic I discovered that taxit taxtable taxonomy.db -f tax_ids.txt -o taxa.csv produces a taxa.csv file with malformed rank names.

From an old file: "tax_id","parent_id","rank","tax_name","root","below_root","superkingdom","below_superkingdom","below_below_superkingdom","below_below_below_superkingdom","below_below_below_below_superkingdom","phylum","below_phylum","below_below_phylum","subphylum","class","below_class","below_below_class","subclass","order","below_order","below_below_order","suborder","family","below_family","below_below_family","below_below_below_family","subfamily","tribe","genus","below_genus","subgenus","species_group","species_subgroup","species","below_species","below_below_species","below_below_below_species","subspecies","below_subspecies","below_below_subspecies","below_below_below_subspecies"

From a new file: "tax_id","parent_id","rank","tax_name","root","root_","superkingdom","superkingdom_","superkingdom__","phylum","phylum_","phylum__","class","class_","order","order_","order__","family","family_","genus","species","species_"

One can distinguish between X, below_X, below_below_X from these names, but it doesn't look like this was an intentional renaming.

crosenth commented 5 years ago

Hi Jeff,

The new "_" suffix is intentional and replaces the older "below_" prefix. The new annotation was implemented to avoid extremely long "below_belowbelow..." column names that can occur when building extremely large taxtables.

nhoffman commented 1 year ago

Closing - please repoen if you have further questions.