DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
235 stars 73 forks source link

Issue with number of raw reads #206

Closed mggrami closed 3 years ago

mggrami commented 3 years ago

Hello!

I have a question regarding number of reads. My dataset contain: 103.425714 M of paired end reads. However after running centrifuge + centrifuge-kreport:

centrifuge -x ../database_bacteria_archaea_2018/p_compressed_2018_4_15/p_compressed -1 out_BY15_DNA1_1.fq.gz -2 out_BY15_DNA1_2.fq.gz -S centrifuge_BY15_class_results --report-file centrifuge_report_BY15.tsv --threads 12 && centrifuge-kreport -x ../database_bacteria_archaea_2018/p_compressed_2018_4_15/p_compressed centrifuge_BY15_class_results

It only gives me half of the total number of reads -> 51.712857 M

My question is: Is it not reading the second file or is it combining reads together? Sincerely, Michal

mourisl commented 3 years ago

Centrifuge combines the mate pairs together.

mggrami commented 3 years ago

Thank You for the quick answer!