BackofenLab / IntaRNA

Efficient target prediction incorporating accessibility of interaction sites
https://backofenlab.github.io/IntaRNA/
Other
45 stars 27 forks source link

IntaRNAhelix parameters #198

Closed golendraite closed 2 years ago

golendraite commented 2 years ago

Hello, I need to predict which bacterial sRNAs could be interacting with one mRNA. While reading IntaRNA documentation I found that IntaRNAhelix improves bacterial sRNA target prediction. In the publication (Gelhausen et al., 2019, cited in documentation) it is said that the best parameters are maximal helix length = 11 and helix energy threshold = -7.5. However, I found another publication: "IntaRNAhelix - composing RNA–RNA interactions from stable inter-molecular helices boosts bacterial sRNA target prediction". Here the best parameters: maximal helix length = 10 and helix energy threshold = 0. Thus, I have a question, which parameters are better to use. I have noticed that setting max helix length to either 10 or 11 does not have a huge difference to the results that I get when using IntaRNA with default parameters (the top 10 hits are almost the same, the order of them is different). But setting the overall helix energy threshold does have a big change (now almost all top 10 hits are different), which is bigger with the smaller energy threshold used. I would really appreciate if you could recommend which is better. I use a whole sequence of both sRNAs and mRNA, and mainly focus on predictions near the RBS site.

martin-raden commented 2 years ago

Hi @golendraite

good question and thanks for spotting the issue (will fix it right away).

Eventually, the more recent JCB publication provides the updated and corrected results after fixing a bug in the implementation (if I recall correctly). That's why IntaRNA comes with the default value for helixMaxE=0 and helixMaxBP=10, as suggested by this publication.

And yes, the energy boundary has a much more stringent impact on the prediction results.

I would recommend against helixFullE, when using the values from the publication, since this will include several energy penalties (like E_init), which will result in even more stringent filtering if the boundaries are not adapted.

When using the full sequences (which is good) you can constrain the region of interest for your prediction (eg to RBS +-200) using either tRegion or seedTRange for the mRNA, which will also speed up computation and reduce the list of results.

I hope that helps. Please close the issue when satisfied or keep on asking. ;)

Best, Martin

golendraite commented 2 years ago

Thank you!