DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

centrifuge-kreport: process "no rank" assignments #109

Closed jsh58 closed 6 years ago

jsh58 commented 6 years ago

Currently, centrifuge-kreport ignores "no rank" taxonomic assignments, due to the way the centrifuge output is processed. This contributes to the discrepancy in read counts, which some users have noted (e.g. #96).

For example, this centrifuge output:

readID  seqID   taxID   score   2ndBestScore    hitLength   queryLength numMatches
read1   no rank 1   3217    0   95  146 1
read2   no rank 1   64  0   23  150 1

currently leads to this centrifuge-kreport output, which looks like 0 reads were processed:

  0.00  0   0   U   0   unclassified

The debugged version produces this:

  0.00  0   0   U   0   unclassified
100.00  2   2   -   1   root
mourisl commented 6 years ago

Thanks! Definitely we should use "\t" in the perl script to split the columns...