galaxyproject / dunovo

Reference-free duplex sequencing pipeline.
Other
18 stars 6 forks source link

bowtie output sometimes lacks NM tag #27

Open NickSto opened 3 years ago

NickSto commented 3 years ago

Issue in barcode error correction (baralign.sh).

Ran into the situation where bowtie output a SAM file with many alignments lacking an NM tag. Only happened on barcodes containing N's. It doesn't always omit NM when the sequence contains N's. But when the NM is omitted, the sequence always contains an N (in this dataset, at least). In this instance, only 15884 (0.5%) out of 3113708 alignments lacked the NM tag.

Having any alignments without an NM tag causes the current (2.16) version of correct.py to fail. I'll probably have to add an exception - maybe check if the barcode contains an N, and allow it if so.

This happened with bowtie 1.3.0, but it may also happen in 1.2.1.1.

Internal note: This occurred on 2020-12-04 when looking at the G4 Eta dataset. The bad SAM file is at duplex/g4/dunovo/eta/pb2/correct.2.sam.