ViennaRNA / forgi

An RNA manipulation library.
GNU General Public License v3.0
52 stars 31 forks source link

Get neighbors of element in specified range #33

Closed bzudi closed 5 years ago

bzudi commented 5 years ago

Hi, I want to explore the micro enviroment of some nucleotide. I saw how can i get the element of this nucleotide. let it be E1 i still need some help with 1.How can i get all neighbors of E1 in specified range, lets say 50nt up and down stream ? 2.How can i get the the closest multiloops of E1 ? both up stream and down stream

Thanks Udi

Bernhard10 commented 5 years ago

1. How can i get all neighbors of E1 in specified range, lets say 50nt up and down stream ?

1a: 10 nts up- and downstream, but not along other branches of junctions

If you want to get the microenvironment of nucleotide 58

import forgi
import forgi.graph.bulge_graph as fgb
fx ="""
GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC
....((((((....((.......((((.((((.(((...(((((..........)))))...((.......))....)))......))))))))......))...)).))))......(((....((((((((...))))))))...)))......
"""
nuc_number = 58 # Nucleotide of interest
rna = fgb.BulgeGraph.from_fasta_text(fx)[0]
elems = set()
for i in range(nuc_number-10, nuc_number+11):
    elems.add(rna.get_elem(i))
print(", ".join(sorted(elems)))

This gives "h0, h1, m1, s6, s7" So you have two hairpins (h0 and h1), a multiloop segments and two stems. To find out how the multiloop segments are connected, you can use the property rna.junctions (only in forgi 2.0, currently in the branch develop-2.0), which tells you that m0,m1 and m2 form a 3-way junction, while f0,m3 and t0 form the exterior loop.

1b: Follow basepairs as well.

Now we will consider the pairing partner of a nucleotide to have a distance of 1 to the nucleotide, and follow all branches up to a distance of 10. (Only forgi2.0 / branch develop-2.0 )

elems=set()
for elem in rna.defines:
    if rna.ss_distance(58, ref_elem)<=10:
        elems.add(elem)
print(", ".join(sorted(elems)))

This gives "h0,h1,i3,m0,m1,m2,s4,s5,s6,s7".

bzudi commented 5 years ago

thanks ! Regarding my second question. Should i itereate on up and down nucleotides until i encounter multiloop ?

Bernhard10 commented 5 years ago

For question 2, you can use

elem = rna.get_elem(58)
print(list(rna.iter_elements_along_backbone()))

Then elem is s6 and the elements along the backbone are ['f0', 's0', 's1', 'i0', 's2', 'i1', 's3', 'i2', 's4', 'i3', 's5', 'm0', 's6', 'h0', 's6', 'm1', 's7', 'h1', 's7', 'm2', 's5', 'i3', 's4', 's3', 'i1', 's2', 'i0', 's1', 'i4', 's0', 'm3', 's8', 'i5', 's9', 'h2', 's9', 'i5', 's8', 't0'], which means that m0 and m1 are the closest multiloop segments to the stem s6