ViennaRNA / forgi

An RNA manipulation library.
GNU General Public License v3.0
52 stars 31 forks source link

Improved performance of creating BulgeGraph #48

Closed domen13 closed 1 year ago

domen13 commented 1 year ago

Method forgi.graph._graph_construction.collapse() is the major performance bottleneck for long sequences. This is due to the combinatorial explosion: For every collapsed bulge, it re-traverses all possible pairs of bulges, having the complexity $\mathcal O(n^3)$ for $n$ bulges. It is enough to iterate through all combinations just once, making the algorithm $\mathcal O(n^2)$.

Reduction in calculation time is about 2-fold for 2500 nt sequences (~1 s → 0,5 s), and 5-fold for 11 000 nt sequences (~60 s → 12 s).

Bernhard10 commented 1 year ago

Great find.

Bernhard10 commented 1 year ago

@domen13 Now that I think about it, I'm wondering if a linear runtime could be achieved by iterating over the stems (defines starting with 'y' in this case) and checking the attached bulges instead of iterating over the bulges and checking if they are attached to the same stems (edges).