Currently R2DT uses covariance models based on Rfam, RiboVision or other sources to predict the secondary structure. The covariance models represent the "consensus" structure that is common to all Rfam family members or a specific structure from a certain RiboVision template.
When R2DT uses these models to analyse an input sequence, the template structure is passed on to the sequence, which is the main idea of R2DT. However, the input sequence can have additional structural features not present in the template, and R2DT will consider such regions unstructured which is not great.
A specific example of this issue is microRNAs. The Rfam family consensus includes the basepairs found in most instances but Individual sequences can have additional basepairs that are not part of the consensus (or may even be in conflict with the consensus). It is important to "zip-up" any missing basepairs at the ends of the microRNA helices using the consensus structure as folding constraint. We need to make sure that basepairs are not missing from the microRNA structures due to imposing family consensus on an individual sequence.
In technical terms, we need to run ViennaRNA or similar software to predict the secondary structure of the input sequence using the template structure as a constraint. Then instead of the template structure, R2DT would use the template structure with additional basepairs to generate an SVG.
Bonus points: there should be a way to tell which basepairs in the SVG come from the template and which ones are "extra" basepairs predicted by constraint folding.
This algorithm should be invoked with an optional flag that would only apply in some cases (like microRNAs). It would be undesirable to have this behaviour for snoRNAs or rRNAs with introns, for example.
[x] Run RNAfold with the constraints file and the original sequence
[x] Integrate RNAfold output into the rest of the visualise_rfam function
[ ] Test on microRNAs (for example, this sequence should have fewer unpaired nucleotides), if successful, adapt the code to run in ribovision visualise function
Currently R2DT uses covariance models based on Rfam, RiboVision or other sources to predict the secondary structure. The covariance models represent the "consensus" structure that is common to all Rfam family members or a specific structure from a certain RiboVision template.
When R2DT uses these models to analyse an input sequence, the template structure is passed on to the sequence, which is the main idea of R2DT. However, the input sequence can have additional structural features not present in the template, and R2DT will consider such regions unstructured which is not great.
A specific example of this issue is microRNAs. The Rfam family consensus includes the basepairs found in most instances but Individual sequences can have additional basepairs that are not part of the consensus (or may even be in conflict with the consensus). It is important to "zip-up" any missing basepairs at the ends of the microRNA helices using the consensus structure as folding constraint. We need to make sure that basepairs are not missing from the microRNA structures due to imposing family consensus on an individual sequence.
In technical terms, we need to run ViennaRNA or similar software to predict the secondary structure of the input sequence using the template structure as a constraint. Then instead of the template structure, R2DT would use the template structure with additional basepairs to generate an SVG.
Bonus points: there should be a way to tell which basepairs in the SVG come from the template and which ones are "extra" basepairs predicted by constraint folding.
This algorithm should be invoked with an optional flag that would only apply in some cases (like microRNAs). It would be undesirable to have this behaviour for snoRNAs or rRNAs with introns, for example.