Weeks-UNC / RNAvigate

Very flexible tools for plotting and analyzing all kinds of RNA structure data
MIT License
10 stars 0 forks source link

filtering arc plot contact distance on regional plot runs forever #5

Closed scottallen877 closed 2 years ago

scottallen877 commented 2 years ago

Completes in about 5-10 seconds plot = DMS.plot_arcs(interactions="pairprob", region=[6070,6120], title=False, colorbar=False)

Completes never (>10mins) plot = DMS.plot_arcs(interactions="pairprob", region=[6070,6120], title=False, colorbar=False, interactions_filter={"cdBelow":600})

This is a subset region of a 7kb transcript. The subset region contains incomplete arcs that either only begin or only end within the region. I'm guessing this instance is not accounted for in some loop that causes it to run forever?

The same holds true for other subset regions as well:

Completes in ~20 seconds plot = DMS.plot_arcs(interactions="pairprob", region=[5500,7000], title=False, colorbar=False)

Completes never plot = DMS.plot_arcs(interactions="pairprob", region=[5500,7000], title=False, colorbar=False, interactions_filter={"cdBelow":600})

Psirving commented 2 years ago

Thanks Scott! This is really helpful feedback. I will add a disclaimer to the documentation for the cdAbove/cdBelow filters.

To acheive what you're going for here, I highly recommend re-running partition with --maxdistance 600.

In the background, RNAvigate is computing the contact distance pairwise matrix for the entire RNA secondary structure. This implements a breadth first search spanning the tree for every possible starting node O(V(V+E)). V=nodes (nucleotides), E=edges (basepairs and backbone). This is probably the most intensive computation RNAvigate does. I've tested this on 18S ribosome, which takes about 1 minute on my computer. It scales roughly binomially with the length of your RNA (if 1000 nt takes 1 min, 10000 nt takes about 100 min). The good news is that once RNAvigate does this computation on a given structure in a given sample, the result is stored and any further filtering by contact distance is almost instant.

scottallen877 commented 2 years ago

No prob! And okay, I will do the filtering step on partition then. Thanks for the solution!

Cheers, Scott

On Wed, Sep 14, 2022 at 2:36 PM Patrick Irving @.***> wrote:

Thanks Scott! This is really helpful feedback. I will add a disclaimer to the documentation for the cdAbove/cdBelow filters.

To acheive what you're going for here, I highly recommend re-running partition with --maxdistance 600.

In the background, RNAvigate is computing the contact distance pairwise matrix for the entire RNA secondary structure. This implements a breadth first search spanning the tree for every possible starting node O(V(V+E)). V=nodes (nucleotides), E=edges (basepairs and backbone). This is probably the most intensive computation RNAvigate does. I've tested this on 18S ribosome, which takes about 1 minute on my computer. It scales roughly binomially with the length of your RNA (if 1000 nt takes 1 min, 10000 nt takes about 100 min). The good news is that once RNAvigate does this computation on a given structure in a given sample, the result is stored and any further filtering by contact distance is almost instant.

— Reply to this email directly, view it on GitHub https://github.com/Weeks-UNC/RNAvigate/issues/5#issuecomment-1247157896, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3CJBGEDBO2ZSL7BT5VISGDV6ILKPANCNFSM6AAAAAAQMU6PFA . You are receiving this because you authored the thread.Message ID: @.***>