DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

Could the output format be as the kraken? #133

Open yeli7068 opened 6 years ago

yeli7068 commented 6 years ago

Hi, Thanks for your works. Could the output of centrifuge be as that of kraken? The output format of kraken is suitable for applications as integration site detection in a primary step. Best Regards, Yang

mourisl commented 6 years ago

The script centrifuge-kreport in the package can do the job.

yeli7068 commented 6 years ago

Thanks for your quick response. I hope the out format should be something like "562:13 561:4 A:31 0:1 562:3" where the the first 13 k-mers mapped to taxonomy ID #562. So you can see if there is an integration site, the output might be "562:50 9606:50". To my understandings, there are similarities for assigning the taxid to a reads between kraken and centrifuge. The above outfmt, or something like that, might be inside the program. It is quite helpful if you can help to output this format. The reason I make this request is because that 1) the depth required by integration site detection were at least 10X which resulted in big amount of data; 2) jellyfish v1 required by kraken to build database index has some limitations when it counts the k-mer.