Closed domen13 closed 1 year ago
Great find.
@domen13 Now that I think about it, I'm wondering if a linear runtime could be achieved by iterating over the stems (defines starting with 'y' in this case) and checking the attached bulges instead of iterating over the bulges and checking if they are attached to the same stems (edges).
Method
forgi.graph._graph_construction.collapse()
is the major performance bottleneck for long sequences. This is due to the combinatorial explosion: For every collapsed bulge, it re-traverses all possible pairs of bulges, having the complexity $\mathcal O(n^3)$ for $n$ bulges. It is enough to iterate through all combinations just once, making the algorithm $\mathcal O(n^2)$.Reduction in calculation time is about 2-fold for 2500 nt sequences (~1 s → 0,5 s), and 5-fold for 11 000 nt sequences (~60 s → 12 s).