Currently, our pseudoknot handling only supports 3rd-order pseudoknots as the only supported sets of characters for pairs are (), [], {}, and <>. We now add pairs of lower and upper case letters as well as explicitly erroring when encountering situations we can't handle.
The setPairs implementation is sometimes slower than the previous implementation due to the performance penalty of property access. However, following the pattern of the previous approach would have caused a significant performance penalty for longer structures without higher-order pseudoknots, and I've done a fair amount of work here to optimize performance in multiple different situations. This does mean that the code does not read as well as it maybe could, as other patterns have an additive performance penalty. Performance is important here as this is used on every fold (consider user scripts that may call fold many times in quick succession) as well as in the current implementation of pseudoknot filtering (an expensive part of layout recomputation). Some examples where this comes up include:
Usage of assignment in conditionals to avoid excess property accesses. This could have been done with additional conditional nesting instead, but gut feeling is that this is clearer?
Note that if the left pair is not found but it's a legal character, we assign to the local variable then assign to both left and right maps - again, limiting property accesses.
Using a second object which maps to the same pair stacks instead of mapping to the paired character (yet another reduction in property access).
Unpaired is the first case while cut point the last case. Admittedly I didn't really benchmark this - it's possible locality could make it better/about the same if moved earlier, but my rationale here is that it should be very rare that we even need to check for this at all.
Dynamically adding pair maps for additional characters as needed instead of preallocating them (which is wasteful for structures that don't need them, which is the most common case)
Maintaining the "naive" option for when we know we have no pseudoknots
We did not used to error on encountering structures we couldn't handle, but we do now. This could technically cause issues somewhere, but I feel like it is more important to know that something has gone wrong rather than not knowing that some pairs have silently been changed to unpaired bases.
Summary
Currently, our pseudoknot handling only supports 3rd-order pseudoknots as the only supported sets of characters for pairs are
()
,[]
,{}
, and<>
. We now add pairs of lower and upper case letters as well as explicitly erroring when encountering situations we can't handle.This was change was specifically spurred by the in-progress RibonanzaNet integration sometimes predicting these higher-order pseudoknots (https://forum.eternagame.org/t/preview-ribonanzanet-ss-in-eterna/4955/15).
Implementation Notes
Testing
New and existing unit tests
Related Issues
Continues/stacked on #753, raised as part of #748