Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
479 stars 82 forks source link

classify_wf errors out when all inputs are unclassified #576

Closed Zach-Sisson-1 closed 6 months ago

Zach-Sisson-1 commented 8 months ago

The classify_ wf pipeline will error out if all input query genomes are unclassifiable as it does not produce a bac120 or ar53 summary file. However, if the input genomes contain at least one classifiable genome, then all other unclassifiable genomes get placed into the bac120 summary file and look like this:

user_genome classification  fastani_reference   fastani_reference_radius    fastani_taxonomy    fastani_ani fastani_af  closest_placement_reference closest_placement_radius    closest_placement_taxonomy  closest_placement_ani   closest_placement_af    pplacer_taxonomy    classification_method   note    other_related_references(genome_id,species_name,radius,ANI,AF)  msa_percent translation_table   red_value   warnings
00949864-6241-46e4-8c40-0efc0b7c7c55    Unclassified    N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A No bacterial or archaeal marker

In this example, I ran one classifiable Archaeal genome and an unclassifiable genome which produced both the ar53 and bac120 summary files in the classify output. When running only the unclassifiable genome, I get a FileNotFound Error.

It would be nice if the bac120/ar53 summary files still get produced and take the form of the example above even if all inputs are unclassifiable instead of throwing an error.