checkm2 predict: diamond annotations as input

chklovski / CheckM2

Assessing the quality of metagenome-derived genome bins using machine learning

GNU General Public License v3.0

165 stars 19 forks source link

Diamond only accepts single inputs, so we concat protein files and chunk them as input using tempfile

For large numbers of genomes (e.g., 10k or 100k MAGs), it would be best to annotation genomes in batches, with each batch annotated in a separate job. Then, the merged annotations can be provided as input to checkm2 predict. This should scale better than just only DIAMOND job for all genes in all genomes.

All that would likely be necessary to implement this is to allow for gene annotation files as input (similar to --genes in checkm2 predict) and skip the gene calling & annotation steps.

chklovski / CheckM2

checkm2 predict: diamond annotations as input #40