Closed SilasK closed 2 years ago
I would not recommend doing that without at least a bit of benchmarking. If the clustering is 100% amino acid identity, I think this becomes like reimplementing the mmseqs taxonomy module, but otherwise, I think you have to be careful to not lose precision.
Hey I've seen that you already started to allow running mmseqs outside of SemiBin.
Do you think there could be an efficient way to annotate a large set of samples with taxonomy by first creating a gene catalog e.g. with linclust, annotating the gene catalog once, and then aggregating the taxonomy for each contig.