This replaces the implementation of SecStruct.getParenthesis with a more efficient implementation. This also improves the efficiency of filtering for only/no pseudoknots, which is one of the most expensive parts of layout recomputation.
Implementation Notes
Instead of turning a pair map into stems, then for each stem iterating over all bases between the start and end of the pair to find if a half-pair has previously been placed there, we now walk over each base only once, keeping track of any half-open pairs that have not been closed off yet. This should bring our time complexity from O(n^2) (both in the stem construction and in the string construction) to O(n). Doing a quick and dirty benchmark, I saw up to a 10x performance improvement in this method. It was slower in some cases, but those situation should already be very fast.
In earlier commits of the PR, I first simplified the existing implementation to wrap my head around it a bit better.
Testing
Added new unit tests, verifying against the previous implementation. As part of my benchmarking experimentation I also validated against the Ribonanza PDB test set.
@tkaragianes Beyond just high-level checking my work, hoping you can check if my comments are sufficient to make it clear how and why this works, and whether the test cases are sufficient
Summary
This replaces the implementation of SecStruct.getParenthesis with a more efficient implementation. This also improves the efficiency of filtering for only/no pseudoknots, which is one of the most expensive parts of layout recomputation.
Implementation Notes
Instead of turning a pair map into stems, then for each stem iterating over all bases between the start and end of the pair to find if a half-pair has previously been placed there, we now walk over each base only once, keeping track of any half-open pairs that have not been closed off yet. This should bring our time complexity from O(n^2) (both in the stem construction and in the string construction) to O(n). Doing a quick and dirty benchmark, I saw up to a 10x performance improvement in this method. It was slower in some cases, but those situation should already be very fast.
In earlier commits of the PR, I first simplified the existing implementation to wrap my head around it a bit better.
Testing
Added new unit tests, verifying against the previous implementation. As part of my benchmarking experimentation I also validated against the Ribonanza PDB test set.
Related Issues
Surfaced during work on #748