Closed biofuture closed 6 months ago
Hey @biofuture
Thanks for reaching out! To answer your question, MetaPhlan uses a set of species-level bin specific genes, with discriminatory properties, to determine a taxonomic profile of the environment/dataset of interest. This MetaPhlan database can be used across a variety of environments. HiTaxon, conversely, generates non-redundant representations of species pangenomes through a two-step clustering approach of sequence data in order to facilitate per-read taxonomic classification rather than taxonomic profiling. This, however, does not preclude sequences in Species A, from being in the pangenome of Species B; as the presence of these conserved sequences are necessary to allow per-read classification. In many ways these two tools are complementary (and recommended to be used in tandem) where, MetaPhlan could be used to derive which genera are present in a sample, in which this can then be used as input to HiTaxon to generate an environment-specific classifier to facilitate per-read classification.
Thanks for developing this new tool. Impressive.
After a quick reading, it seems that your tool use the species specific groups of sequences to do the classification for different species, similary strategies is used for other ranks. This is the strategy MetaPhlan used, clade specific genes as the indicator of presence of a taxa.
I am wondering how HiTaxon is different to MetaPhlan.
Thanks you.