Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
479 stars 82 forks source link

Specify in documentation that the `--genes` parameter requires proteins as input. #571

Closed LeeBergstrand closed 6 months ago

LeeBergstrand commented 9 months ago

Problem Description

Currently, the CLI and the documentation state the following for the --genes parameter:

indicates input files contain called genes (skip gene calling).Warning: This flag will skip the ANI comparison steps (ani_screen and classification).

I wasn't sure if I should provide the CDS gene predictions in nucleotide format or the protein translations of those CDS.

When I fed the pipeline my cds.fna files, it failed as expected. However, I had to dig into GitHub pull requests and source code to determine that the --genes parameter requires proteins as input.

Proposed Solution

Specify in the documentation that the pipeline expects proteins as input when the --genes flag is specified.

indicates input files contain predicted proteins (skip gene calling). Warning: This flag will skip the ANI comparison steps (ani_screen and classification).