Is there a reason gap_open has to be > gap_extend and not >= gap_extend?

Daniel-Liu-c0deb0t / block-aligner

SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.

https://crates.io/crates/block_aligner

MIT License

124 stars 7 forks source link

Is there a reason gap_open has to be > gap_extend and not >= gap_extend? #7

Open tfenne opened 2 years ago

tfenne commented 2 years ago

If I try to use

let gap_penalties = Gaps { open: -1, extend: -1 };

I get an assertion error. I understand that the way scoring works that gap-open needs to include the gap extension, since a gap of length one is scored as just the open penalty. However, I would like to be able to do alignments where e.g. a 1bp gap costs -1 and a 2bp gap costs -2, etc. which is not possible with the current check.

Is there any reason to require that open > extend instead of open >= extend?

Daniel-Liu-c0deb0t commented 2 years ago

IIRC the reason is because it saves a couple instructions when computing tracebacks (we can make certain assumptions on the scores). I'll look into allowing open >= extend.

tfenne commented 2 years ago

Ah, I see. I would appreciate it. It's nice to be able to create alignments where the cost per bp of a gap and the cost of a mismatch are the same. I'm using block-aligner to align PacBio reads to other PacBio reads and my expectation is that 1-2bp indels are about as likely as mismatches.

Daniel-Liu-c0deb0t commented 2 years ago

This probably won't be implemented because I'm fairly certain adaptive-banding-like approaches don't work well when open == extend (this was measured in the original adaptive banding paper, and I think its probably the case for block aligner too). Also, it probably makes more sense to use a different approach for linear gap penalty alignments because block aligner has a lot of overhead that allows it to handle affine gap penalities.