broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.64k stars 580 forks source link

Consider restoring CigarUtils optimization for short-circuiting to M-only CIGARs. #7441

Open samuelklee opened 2 years ago

samuelklee commented 2 years ago

An optimization introduced in https://github.com/broadinstitute/gatk/pull/5466 was removed in https://github.com/broadinstitute/gatk/pull/6885. The latter exposed Smith-Waterman parameters, allowing them to be changed from their default values and thus to possibly violate conditions assumed by the former. We could restore the optimization if we added explicit checks of these conditions.

droazen commented 2 years ago

@samuelklee Is this at all related to this PR? https://github.com/broadinstitute/gatk/pull/6015

samuelklee commented 2 years ago

@droazen thanks for noticing that PR. Looks like there’s quite a bit more going on there than is covered by this issue, although I didn’t take a close look. We should certainly make sure that any assumptions on SW parameters, etc. there are checked as well, if it does end up going in—seems quite stale, no?

Also, hope you don’t mind if I unassign myself from this—what’s the point of punting on something if it just gets reassigned to you? ;) Happy to review a PR if someone else is convinced that the original optimization is worth restoring, though!