DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
237 stars 73 forks source link

Centrifuge with 16S database #171

Closed LaraUrban closed 5 years ago

LaraUrban commented 5 years ago

Hi,

I was wondering if you could help us with the following problem that we encounter when using Centrifuge classification: We wanted to classify our 16S rRNA reads sequence by Oxford Nanopore, and used Centrifuge and Kraken2 to do so while building a NCBI database based on all bacterial data and a 16S database based on the Silva database for both classification tools. Whereas Kraken2 is able to classify most reads (>90%) with both bacterial NCBI and 16S database, Centrifuge only classifies comparably many reads if we take the bacterial NCBI database. In the case of the 16S database, very few reads (~10%) can be classified. I was wondering if you have any explanation for this effect?

Many thanks for your help!

Best regards, Lara

mourisl commented 5 years ago

Since the 16S sequences is on the bacteria reference genome, it could be that by putting them together, the database becomes too repetitive. Can you try to use larger -k value?

LaraUrban commented 5 years ago

Many thanks @mourisl we will try a larger k

LaraUrban commented 5 years ago

Hi @mourisl, Unfortunately, increasing k did not change the percentage of classified reads. Do you have another idea? Many thanks!

LaraUrban commented 5 years ago

Also, just to clarify: We either use the 16S database or the bacterial database, never a combined version. Many thanks for any advice!

LaraUrban commented 5 years ago

PS: Even at k=1000, it only improves from ~5% classified reads (k=5) to 7%. @mourisl

LaraUrban commented 5 years ago

@mourisl sorry, any idea?

LaraUrban commented 5 years ago

Hi @mourisl,

As stated above, increasing k did not change the percentage of classified reads. If you do not have any other idea, I am not sure what to do besides stating in the manuscript that Centrifuge does not work for the 16S database. We will have to include the Centrifuge 16S since we also use Kraken 16S and Bracken 16S as well as their bacterial database counterparts. Please let me know what you think.

Many thanks, Lara

mourisl commented 5 years ago

Can I have your 16S database for Centrifuge? And can I also have a few Centrifuge unclassified but Kraken2 classified reads to debug? Thanks.

mourisl commented 5 years ago

I just noticed that for some 16S database, the reference sequence may use Uracil(U) instead of T. Is this for your case? Thanks.

LaraUrban commented 5 years ago

Many thanks @mourisl! The reference sequence contains T, not U, but many thanks for the suggestion.

We are using the SILVA 16S database: https://www.arb-silva.de/

I will send you a read classified by Kraken2, but unclassified by Centrifuge (using the same SILVA database) via email. Thanks for looking into this.

LaraUrban commented 5 years ago

PS: I just sent you an email. Many thanks again!

LaraUrban commented 5 years ago

Hi @mourisl,

Many thanks for your great help! For everyone: It turned out that we had been using the conversion table produced by Kraken2, and not by Centrifuge.

We will now rebuild the centrifuge 16S database, and include the Centrifuge classifications in our manuscript.

mansi-aai commented 8 months ago

Hello @LaraUrban !

I am trying to built SILVA 16s rRNA database to use centrifuge. I got the FASTA file from SILVA website but I am not able to create conversion-table , taxonomy-tree and name-table which are required to build the database. Can you please let me know how did how created that ? Thank you !

Many Thanks, Mansi