Lattice-Automation / seqfold

nucleic acid folding
MIT License
79 stars 12 forks source link

Possible bug #10

Closed deprekate closed 2 years ago

deprekate commented 2 years ago

Hi, I really like the package. A good alternative to Vienna (which isn't on pypi).

But when verifing your example sequence output against Vienna's RNAfold, it looks like seqfold misses two pair bonds? the lowercased g-c and a-u

         GGGAGGUCgUUACAUCUGGGUAAcaCCGGUACUGAUCCGGuGACCUCCC
RNAfold  (((((((((((((......)))))(((((.......))))))))))))) (-25.10)
seqfold  ((((((((.((((......))))..((((.......)))).)))))))) (-25.5)
jjti commented 2 years ago

Hey there and thanks for pointing this out!

I poked around and I think that it's from a diff in the underlying energy functions and penalties applied to multibranching/bifurcations:

$ seqfold GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC --celcius 32 --dot-bracket --sub-structures
GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC
((((((((.((((......))))..((((.......)))).))))))))
   i    j    ddg  description
   0   48   -1.9  STACK:GG/CC
   1   47   -1.9  STACK:GG/CC
   2   46   -1.4  STACK:GA/CT
   3   45   -1.4  STACK:AG/TC
   4   44   -1.9  STACK:GG/CC
   5   43   -1.6  STACK:GT/CA
   6   42   -1.4  STACK:TC/AG
   7   41   -0.5  BIFURCATION:4n/3h
   9   22   -1.1  STACK:TT/AA
  10   21   -0.7  STACK:TA/AT

At the BIFURCATION line it's saying that in its minimum free energy structure, it's seeing 4 free nucleotides as being the best choice. Working backwards, that came from this paper: https://pubmed.ncbi.nlm.nih.gov/15139820/

DNA_MULTIBRANCH: MultiBranch = (2.6, 0.2, 0.2, 2.0)
"""a, b, c, d in a linear multi-branch energy change function.

Inferred from:
Supplemental Material: Annu.Rev.Biophs.Biomol.Struct.33:415-40
doi: 10.1146/annurev.biophys.32.110601.141800
The Thermodynamics of DNA Structural Motifs
SantaLucia and Hicks, 2004
"""

In other words the energy functions and algo I adapted from those papers applies a greater penalty to bulge inbetween the branches than RNAFold:

Screen Shot 2022-03-26 at 8 40 41 PM

I'm not saying that seqfold is right or that seqfold is wrong, it's just got a slightly diff approach, esp with regard to the energy penalties. I purposefully did not look at the RNAseq energies (or source code) to avoid running afoul of its license. This is all adapted from the public stuff. There's a chance seqfold is right and RNASeq is wrong... but in this particular case they're more similar that usual, I'd say

I hope this helps for context. I'm thinking it might help folks if I write up a pinned issue summarizing the list of differences that I'm aware, of which this is one

jjti commented 2 years ago

Also I appreciate this:

Hi, I really like the package. A good alternative to Vienna (which isn't on pypi).

I hope the package remains useful