amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

feature request: update @HD to v1.6 specification and add "GO:query" tag for unsorted output #160

Closed eboyden closed 1 year ago

eboyden commented 1 year ago

Unless I'm mistaken, paired-end alignments (without the -so option) inevitably produce unsorted but query-grouped outputs. Several downstream tools such as fgbio include functions that insist on name-sorted or query-grouped inputs, and rely on the @HD line to confirm this. Without the GO:query tag present, either it needs to be added manually, or the file needs to be unnecessarily (re)-collated just to add this tag.

bolosky commented 1 year ago

I believe that SNAP has the property that you describe.

I'd have to do some due diligence to make sure that there aren't any other differences between SAM v 1.4 and 1.6 that we'd need to observe, however.

bolosky commented 1 year ago

This is in 2.0.2.dev.9. Let me know if it does what you want.

bolosky commented 1 year ago

In 2.0.2