BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
638 stars 160 forks source link

Add --sam-omit-prim-seq #458

Closed sfiligoi closed 5 months ago

sfiligoi commented 5 months ago

Add --sam-omit-prim-seq, with the same semantics as --omit-sec-seq but operating on primary alignments.

Addresses #457

sfiligoi commented 5 months ago

@ch4rr0 Could you please review?

ch4rr0 commented 5 months ago

Hello Igor, I will take a look today.

sfiligoi commented 5 months ago

The resulting output file is huge, putting a lot of strain on the IO system. Reducing the IO cost at the source would be highly preferred.

sfiligoi commented 5 months ago

CC @wasade

wasade commented 5 months ago

Thanks, @sfiligoi!

@ch4rr0, shaving IO natively within bowtie2 would be pleasant

ch4rr0 commented 5 months ago

@BenLangmead, thoughts?

sfiligoi commented 5 months ago

Just a reminder....

BenLangmead commented 5 months ago

I think this kind of straightforward postprocessing is best left to awk and similar tools. Otherwise we accumulate too many command-line options that make later changes trickier.

I know that this is in tension with the fact that Bowtie had the --suppress option for this purpose: https://bowtie-bio.sourceforge.net/manual.shtml#bowtie-options-suppress. But I think keeping it simple is key.

sfiligoi commented 5 months ago

Unfortunately, --suppress does not work with -S/--sam.

BenLangmead commented 5 months ago

Correct

wasade commented 5 months ago

Hi @BenLangmead, this option is valuable to our efforts with Qiita (https://qiita.ucsd.edu/). Qiita right now houses .sam output from 50-100k metagenomic samples, which are typically mapped against a few databases. The volume of data overall is large, and reprocessing occurs periodically. We currently post process to reduce storage burden, but it would be an appreciable runtime improvement to avoid the significant IO needed to stage .sam temporarily for filtering.

BenLangmead commented 5 months ago

I appreciate your comments; I suggest awk or mawk or similar should be a good expedient, or feel free to use a fork with your change. We do not plan to integrate this feature into the master branch.

wasade commented 5 months ago

Thanks, @BenLangmead! We appreciate the follow up, and all of incredible work that has, and continues, to go into bowtie2!