DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

Reference position for centrifuge hits? #170

Closed jackwadden closed 5 years ago

jackwadden commented 5 years ago

Is it possible to recover the reference locations from Centrifuge partial hits?

I'm trying to print the reference ID (this is what is used to compute the taxonomy ID) and also the bp offset into that reference for every hit. For a genome coordinate (Coord) I can call Coord.ref() and Coord.off() but coord.off() always returns zero. The coordinate offset seems like it is always set to zero in the getCoords()->getGenomeIdx() function.

I've tried using ebwt.joinedToTextOff(). This looks like the function in the Bowtie2 source code that provides the reference ID and offset, however it doesn't give correct results. I could be using it incorrectly though.

Does anyone have any suggestions for where and how to properly extract exact match hit positions in the code? Any help would be greatly appreciated.

Thanks in advance,

-Jack

mourisl commented 5 years ago

Centrifuge does not store the coordinate information in the index, so you could not get such information. Storing coordinate information consumes a lot of memory and does not help much read classification, so we decided not to store coordinates.

jackwadden commented 5 years ago

Understood; that makes sense. Appreciate the quick response!