DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
250 stars 73 forks source link

centrifuge-kreport values do not ad #276

Open pablorr24 opened 6 months ago

pablorr24 commented 6 months ago

Hi, I've been working on Centrifuge, and using the centrifuge-kreport tool to convert my results to kraken2 format. I'm using:

centrifuge-kreport -x /.../database/db_prefix centrif_report.tsv > kreport.txt

I am getting a kraken report but the values seem to differ. In the screenshot the Centrifuge report (above) values for Bacillus differ from those on the kraken report (below). I'm not sure if I'm interpreting something incorrectly, or there might be an issue with the code.

image

Best regards, Pablo R

mourisl commented 6 months ago

Indeed, the 5 reads marked in the numUniqueReads should be reflected in the report. I'll check whether I can reproduce this issue on our data or ERR3077553. Thank you for reporting it!

pablorr24 commented 4 months ago

Hi, do you have any updates on this?

mourisl commented 4 months ago

I think one reason could be due to the taxonomy tree structure. When there are too many multiple mapping, Centrifuge will try to reduce the reported taxonomy IDs by promoting to higher taxonomy levels. In the main program, the promote will be to the standard taxonomy levels, like species, genus, family,.... In the centrifuge-kreport, the promotion may be to some non-standard taxonomy levels, like subgenus. I guess in your case, the other 4 reads may be uniquely promote to ranks like superspecies, subgenus, so the number is a bit off.