Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

findPalindromes - min.looplength and allow.wobble #42

Closed digitalwright closed 3 years ago

digitalwright commented 3 years ago

The attached code makes the following modifications:

(1) min.looplength is now implemented. Previously specification of a non-zero value would throw an error saying "will be implemented very soon!". This has been the case for many years.

(2) I added a new argument to allow.wobble that will accept G/T and T/G mismatches as matches. This makes the function more practical for finding hairpins since wobble bases are frequent in real palindromes. The default is FALSE to keep the code backwards compatible.

digitalwright commented 3 years ago

Hi @hpages,

These changes should improve the Biostrings palindrome finding functions in the following ways:

(1) Fixes the bug described in the recent pull request from @twmcart. (2) Implements a stack to track mismatches. This maintains speed of the findPalindromes function even after the fix above. (3) Implements min.looplength as described in the comment above. (4) Adds an allow.wobble argument to treat wobble base pairing as a match, as described in the comment above.

Thanks for considering these changes.

Erik

digitalwright commented 3 years ago

Hi @hpages,

FYI, I realized today that the original palindromeArmLength function was impractically slow on XStringViews objects output by findPalindromes. This has been fixed in the latest version. I hope that's it for changes! 👍

Erik

hpages commented 3 years ago

Hi @digitalwright , @twmccart,

Thanks for the patch and sorry for the delay. This is a great patch. It addresses some long standing issues with the findPalindromes() code and add some nice features. In hindsight maybe I should have called the function findHairpins(). Shorter and more descriptive of what's really going on with DNA sequences, especially with the notions of loop and arms.

@digitalwright I have little feedback, mostly cosmetic, I hope it's ok:

Other minor coding style things (some of them for good reasons, others only to keep things more inline with overall Biostrings style):

At the R level:

At the C level:

Thanks!

digitalwright commented 3 years ago

Hi @hpages,

Thank you very much for your thorough review of the suggested modifications to the palindrome finding functions. I made all of the changes you requested. Please let me know if you would like anything else before incorporating these changes into Biostrings.

Thanks, Erik

hpages commented 3 years ago

Hi Erik,

The latest round of changes introduces 2 regressions in palindromeArmLength():

You're welcome to add your name to the man page if you'd like.

Thanks again for all the changes and improvements.

H.

digitalwright commented 3 years ago

Hi @hpages,

Thanks for catching that. All fixed now.

Erik

hpages commented 3 years ago

Looks good. Thanks! H.