DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

Classification --host-taxids --exclude-taxids options #57

Open fconstancias opened 7 years ago

fconstancias commented 7 years ago

Hello, I am using centrifuge to classify metagenomic reads from different environments using nt databse. For some experiments (e.g., host microbiome, ...), it is very useful to exclude host genome from classification. It will be really useful to exclude all taxa belonging to a higher taxonomic rank. For example in a human microbiome project I identify human reads of course but also mice and primate. I have try to exclude all mamalia using the taxid --exclude-taxids 40674 but it did not remove the taxa belonging to that node. I guess it only remove taxa with that id i.e., at the species level.

Is there any way to do that? Maybe that will be something to add.

Thanks a lot.

Flo

mourisl commented 6 years ago

The exclude-taxids option also excludes the rank below the specified taxonomy id. The same is for host-taxids options. For example, if you run exclude-taxids 2 (bacteria), you will get much more unclassified reads.

Sorry for the very late reply. It is possible the taxid 40674 is not in the taxonomy tree of the index. You can use "centrifuge-inspect --name-table index" to check whether mamalia 40674 exists.