BGSU-RNA / RNA-Structure-utils

Some utilities for dealing with RNA structure
6 stars 2 forks source link

Pseudoknots in alphabetic notation #7

Open AntonPetrov opened 12 years ago

AntonPetrov commented 12 years ago

At some point we'll need to deal with secondary structures that look like this: .AAA....<<<>>>

blakesweeney commented 12 years ago

Do we want to just run this through remove pseudoknots then parse or parse the pseudoknots right away?

AntonPetrov commented 12 years ago

Whichever you prefer. I only bring it up cause ~100 rfam families have SS like this, so 5% of the families, some of which are quite interesting, are skipped at the moment.

On Saturday, May 19, 2012 at 10:39 AM, Blake Sweeney wrote:

Do we want to just run this through remove pseudoknots then parse or parse the pseudoknots right away?


Reply to this email directly or view it on GitHub: https://github.com/BGSU-RNA/RNA-Structure-utils/issues/7#issuecomment-5802221

blakesweeney commented 12 years ago

Do you have an example of one of these groups I can use?

blakesweeney commented 12 years ago

Anton, I've added a wrapper around RemovePseudoknots. In addition the parsers can now handle such notation. However, it turns out Rfam secondary structure is contradictory to all other dot brackets in that it can use {} as a pair. So I've added a dialect argument to the constructor. It doesn't break any old code but to parse rfam you now need to do:

parser = DotBracket('AAA<<<...aaa>>>', dialect='rfam')

I've written an example of how to parse some Rfam secondary structure and then remove pseudoknots its in examples/remove_pseudoknots.py. It's only on the develop branch right now, take a look and let me know if I should push it to master.