Closed domen13 closed 4 months ago
Great, thanks for this issue! We're looking into it, and we'll let you know about a fix soon. This might to be related to an issue that Contrafold had and has since fixed, so hopefully simple to resolve.
Tagging @rhiju @HWaymentSteele.
@tkaragianes, thanks for your feedback! We are trying to use EternaFold to compare the results with another secondary structure prediction software, and we are wondering whether there has been any updates regarding this issue.
@domen13 Thanks again for bringing this to our attention.
My lab hasn't been able to attend to this, but we can provide some quick advice.
Some time ago, @HWaymentSteele and I had noticed a similar bug in the LinearPartition implementation of Contrafold. The code's authors eventually fixed the issue by using double instead of float and exp()
instead of FastExp()
. This is the commit, I think: https://github.com/LinearFold/LinearPartition/commit/94bca573107237636b4cbee39560917a9b70faeb
Here's a suggestion -- have you tried to run EternaFold with LinearPartition following these instructions? If so, do you still see probability of pairing greater than 1?
If LinearPartition-E is OK, then you could potentially just use it for your tests. If you have the interest, it would be helpful then to also patch up the (non-linearized) ContraFold code that we have in the EternaFold repository -- and that could be a simple matter of updating float to double. Could you help us explore that with your test cases?
@rhiju, thank you a lot for your response and suggestion!
I have now tested my sequence (along with a set of random sequences) on EternaFold and LinearPartition (either with ContraFold, ViennaRNA or EternaFold parameters), and it seems that your suggestion works! Neither LinearPartition (ContraFold) nor LinearPartition (EternaFold) give corrupted probabilities. You can read a more detailed report here.
Then, I changed <float>
to <double>
in Contrafold.cpp:121
, and it looks like this solves the issue for EternaFold.
I have made a [PR]() to add USE_DOUBLE_IN_PREDICT
in the configuration file.
Expected Behavior
Probability of pairing $pi$, calculated as a sum of all pairing probabilities $p{ij}$, should not exceed unity, $p_i = \sumj p{ij} \leq 1$.
Current Behavior
For longer sequences, probability of pairing for certain nucleotides exceeds 1, the difference being significantly greater than one would expect from numerical instabilities.
Reproduction Steps
Fold the following MS2 sequence (MS2_short.txt) with default parameters:
Looking at the obtained pair probability matrix (MS2_bps.txt), one observes the following two lines:
The total pairing probability for nucleotides 571 and 572 therefore exceeds 1: $p{571} = 1.00134$, $p{572}= 1.000792$.
Environment
No response
Other information
No response