ParkinsonLab / HiTaxon

A hierarchical ensemble framework for taxonomic classification of short reads
1 stars 0 forks source link

How this tool is differentiate with MetaPhlan series? #1

Closed biofuture closed 6 months ago

biofuture commented 6 months ago

Thanks for developing this new tool. Impressive.

After a quick reading, it seems that your tool use the species specific groups of sequences to do the classification for different species, similary strategies is used for other ranks. This is the strategy MetaPhlan used, clade specific genes as the indicator of presence of a taxa.

I am wondering how HiTaxon is different to MetaPhlan.

Thanks you.

Bhav7 commented 6 months ago

Hey @biofuture

Thanks for reaching out! To answer your question, MetaPhlan uses a set of species-level bin specific genes, with discriminatory properties, to determine a taxonomic profile of the environment/dataset of interest. This MetaPhlan database can be used across a variety of environments. HiTaxon, conversely, generates non-redundant representations of species pangenomes through a two-step clustering approach of sequence data in order to facilitate per-read taxonomic classification rather than taxonomic profiling. This, however, does not preclude sequences in Species A, from being in the pangenome of Species B; as the presence of these conserved sequences are necessary to allow per-read classification. In many ways these two tools are complementary (and recommended to be used in tandem) where, MetaPhlan could be used to derive which genera are present in a sample, in which this can then be used as input to HiTaxon to generate an environment-specific classifier to facilitate per-read classification.