DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

sam file format #167

Closed maxibor closed 5 years ago

maxibor commented 5 years ago

I tried the "sam" format output, but it's missing the HEADER, cigar string, and alignments tags (NM, MD, etc...) Is there a way to add this to the "sam" format output ? Or am I misusing the centrifuge CLI ?

Centrifuge version 1.0.4_beta

Commend used:

centrifuge -x /path/to/centrifuge/abv -1 ERR1914888.pair1.trimmed.fastq -2 ERR1914888.pair2.trimmed.fastq --phred33 --threads 8 --out-fmt sam -k 1  --report ERR1914888.creport > ERR1914888.centrifuge.sam

Example "sam" output:

$ head ERR1914888.centrifuge.sam 
ERR1914888.2    0       0       0       0       *0      unclassified    0       251     TCTCAGAGAAGTCGAGAGTAACCTCTTCCACGCTCATTCTCAACGCTTCTGGTATCTTCATCAATGCCTTTATGACATCCTTTGCCTTAGTGCCCTTGACTACAGCAATAAGGCATCCCTTGCGA_CAATAGCCTCGAACAAGGATGCACATGGTCGCAAGGGATGCCTTATTGCTGTAGTCAAGGGCACTAAGGCAAAGGATGTCATAAAGGCATTGATGAAGATACCAGAAGCGTTGAGAATGAGCGTGG <<@BGGGGGGGGGGGGGGGGGGGGGCFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGG8_BCCBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGFFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
ERR1914888.1    0       0       0       0       *0      unclassified    0       250     CATATACCAGCTGTTTTAAATTTTTCAGAAAGAAGATGTAGTCTGGGAAAGGGATATATAGATTGACTGTTGGCGTATCGAAGCGCAATTTCAAATCATGATATATACATCCTGCCAAACAATTA_GTTGTATTTTGTATATTGGCTTTTGCTGTAGAAAAATCTTTATCATCATTGCCAGGAATTAATATTTCTTATTTTATGATATATGTTATTGTGTTTGTGAAATTCAATTACTTTTTTCTTGCAAA  =<BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG_CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGBGEEEGGGDGFGGGGGGGGGG0DG00CFE
mourisl commented 5 years ago

Sorry for the late reply, the sam format in Centrifuge just mean the order and the content of each column follows the sam format. Centrifuge does not do real alignments, so many fields have no information.

maxibor commented 5 years ago

Ok, Thanks for the answer. It might be probably worth mentionning in the documentation as it is a bit misleading for the end user ;)