eternagame / EternaJS

Eterna game/RNA design interface
Other
12 stars 10 forks source link

Improve performance of generating dot-bracket strings with pseudoknots #753

Closed luxaritas closed 5 days ago

luxaritas commented 1 week ago

Summary

This replaces the implementation of SecStruct.getParenthesis with a more efficient implementation. This also improves the efficiency of filtering for only/no pseudoknots, which is one of the most expensive parts of layout recomputation.

Implementation Notes

Instead of turning a pair map into stems, then for each stem iterating over all bases between the start and end of the pair to find if a half-pair has previously been placed there, we now walk over each base only once, keeping track of any half-open pairs that have not been closed off yet. This should bring our time complexity from O(n^2) (both in the stem construction and in the string construction) to O(n). Doing a quick and dirty benchmark, I saw up to a 10x performance improvement in this method. It was slower in some cases, but those situation should already be very fast.

In earlier commits of the PR, I first simplified the existing implementation to wrap my head around it a bit better.

Testing

Added new unit tests, verifying against the previous implementation. As part of my benchmarking experimentation I also validated against the Ribonanza PDB test set.

Related Issues

Surfaced during work on #748

luxaritas commented 1 week ago

@tkaragianes Beyond just high-level checking my work, hoping you can check if my comments are sufficient to make it clear how and why this works, and whether the test cases are sufficient