Missing Index - Githubissues

DaehwanKimLab / centrifuge

Classifier for metagenomic sequences

GNU General Public License v3.0

237 stars 73 forks source link

Missing Index #162

Open LaraUrban opened 5 years ago

LaraUrban commented 5 years ago

Hi,

We applied Centrifuge to our full-16S rRNA reads sequenced with the MinION. While the results look good in general, we run into problems when comparing Kraken2 and Centrifuge classifications when using the standard bacterial database of both applications: Centrifuge seems to lack the index for the "Pseudacircella" genus, but Kraken2 classifies many of our reads as such.

Could you let me know why this index is missing? Or do you have any advice on how to handle these reads?

Many thanks, Lara

mourisl commented 5 years ago

Can you show me the taxonomy id and seqid_to_taxid entry for this Pseudacircella?

LaraUrban commented 5 years ago

Thanks for your quick response!

This is the record for the sequence in the kraken2 'names.dmp' file: 2183547 | Pseudarcicella sp. HME7025

The sequence ID 2183547 corresponds to the following in the 'seqid2taxid.map' file: kraken:taxid|2183547|NZ_CP029346.1

That NCBI ref ID refers to https://www.ncbi.nlm.nih.gov/nuccore/1391135387

Many thanks for your help!

LaraUrban commented 5 years ago

Please let me know if you need any other information.

mourisl commented 5 years ago

I think the format is a bit different. The ".map" in Centrifuge should be like "NZ_CP029346.1\t2183547" (tab separated). Though I remembered Centrifuge can handle kraken:taxid format in some part, it might not be for the .map file.

LaraUrban commented 5 years ago

Hi @mourisl thanks for this information; it still seems that the Centrifuge bacterial database does not contain the Pseudacircella genus at all - is that a bug? That will just be a problem for us if we want to include both Centrifuge and kraken2 results in our manuscript - I am not sure how to handle this if the databases are different. Many thanks for your help!

mourisl commented 5 years ago

I could not find that genus in taxonomy/names.dmp in the bacteria index either. It could be that they are not in the database when we created the pre-built index. Are you using the pre-built index or building a custom database?

LaraUrban commented 5 years ago

Yes we use the pre-built index for centrifuge (p_compressed is the name of the database, and we apply centrifuge-inspect on the database files). We just checked again and its definitely not in there. We might unfortunately have to drop our Centrifuge analyses and just rely on kraken2 if we can't resolve this since it seems to be such an abundant genus in our data... Please let me know what you think!

LaraUrban commented 5 years ago

Or if you knew anyone else I could contact in regards to this issue, @mourisl , please let me know.