DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
687 stars 266 forks source link

Sequence headers duplicated in output #780

Open zoey-rw opened 7 months ago

zoey-rw commented 7 months ago

About 10% of my paired-end sequences are being reported twice in the output files from Kraken2. The headers are not long enough to be truncated, and the sequence headers are not duplicated in the input fastq.gz files. Sometimes, but not always, the reads are classified differently. For example, this is in the output file:

   classified_status                                sequenceID kraken_tax_id sequence_length
1:                 U NB551228:39:HWVCMBGX7:4:21408:20331:17416             0         150|150
2:                 C NB551228:39:HWVCMBGX7:4:21408:20331:17416    1568762216         150|150
                                  classified_details  
1:                      0:116 |:| 0:115 1568762216:1 
2: 0:18 1568762216:5 0:93 |:| 0:64 1568762216:5 0:47 

Looking for any guidance on troubleshooting this, or deciding which of the duplicated lines to retain! Thanks.

jenniferlu717 commented 6 months ago

How did you run the classification command?