RNAcentral / R2DT

Visualise RNA secondary structure in consistent, reproducible and recognisable layouts
https://r2dt.bio
Apache License 2.0
58 stars 10 forks source link

Add a flag to run additional folding using consensus model structure as a constraint #48

Closed AntonPetrov closed 1 year ago

AntonPetrov commented 3 years ago

Currently R2DT uses covariance models based on Rfam, RiboVision or other sources to predict the secondary structure. The covariance models represent the "consensus" structure that is common to all Rfam family members or a specific structure from a certain RiboVision template.

When R2DT uses these models to analyse an input sequence, the template structure is passed on to the sequence, which is the main idea of R2DT. However, the input sequence can have additional structural features not present in the template, and R2DT will consider such regions unstructured which is not great.

A specific example of this issue is microRNAs. The Rfam family consensus includes the basepairs found in most instances but Individual sequences can have additional basepairs that are not part of the consensus (or may even be in conflict with the consensus). It is important to "zip-up" any missing basepairs at the ends of the microRNA helices using the consensus structure as folding constraint. We need to make sure that basepairs are not missing from the microRNA structures due to imposing family consensus on an individual sequence.

In technical terms, we need to run ViennaRNA or similar software to predict the secondary structure of the input sequence using the template structure as a constraint. Then instead of the template structure, R2DT would use the template structure with additional basepairs to generate an SVG.

Bonus points: there should be a way to tell which basepairs in the SVG come from the template and which ones are "extra" basepairs predicted by constraint folding.

This algorithm should be invoked with an optional flag that would only apply in some cases (like microRNAs). It would be undesirable to have this behaviour for snoRNAs or rRNAs with introns, for example.

AntonPetrov commented 2 years ago

Step by step instructions