eternagame / EternaJS

Eterna game/RNA design interface
Other
12 stars 10 forks source link

Oligo substructure energies appear broken in some situations #658

Closed luxaritas closed 2 years ago

luxaritas commented 2 years ago

Test puzzles currently available at https://eternadev.org/labs/10387842. These were derived from https://eternagame.org/game/browse/6116601/?filter1=Id&filter1_arg1=6142990&filter1_arg2=6142990 and https://eternagame.org/game/browse/6296745/?filter1=Id&filter1_arg1=6315780&filter1_arg2=6315780

image The AU pair is causing a 100k kcal penalty, but even when that pair is broken, the rest of the substructure energies all come across as zero. Interestingly, when binding to the other strand image No penalty, more energies are filled out but that loop is still zero image I don't know if this is due to the fundamental implementation of oligo binding in nupack, or if this is an actual bug (either in nupack or our bindings)

Also for comparison's sake, setting a target mode structure in the original puzzle by using the magic glue tool image

LunarFawn commented 2 years ago

did some testing on a branch I made and found that the code does not care if the oligo is passed as the sequence and the main long seq as the oligo. I did find that FullEval is creating nodes that have negative values for node positions and need to figure that one out. The oligo switch with dot-plot test design returns 3 sets of nodes that look like this.

Nodes:

(68) [-2, 409, -1, -140, 58, 210, 57, -170, 56, -110, 55, -100, 54, -140, 53, -180, 52, -90, 51, -180, 50, -210, 50, -110, 49, -170, 48, -230, 47, -90, 46, -110, 45, -50, 44, -230, 43, -200, 42, -140, 41, -40, 40, -100, 39, -90, 38, -230, 10, -20, 9, -230, 8, -90, 7, -180, 6, -290, 5, -210, 4, -90, 3, -180, 2, -340, 1, -200]

(2) [-1, 0] (62) [-1, -120, 48, 210, 47, -170, 46, -110, 45, -100, 44, -140, 43, -180, 42, -90, 41, -180, 40, -210, 39, -110, 38, -170, 37, -230, 36, -90, 35, -110, 34, -50, 33, -230, 32, -200, 31, -140, 30, -40, 29, -100, 28, -90, 27, -230, 12, 350, 11, -190, 10, -210, 9, -50, 8, -80, 6, 80, 5, -210, 4, -90]

LunarFawn commented 2 years ago

I think the problem is in PoseEditMode and how it handles the extra "-1" that is in the pairs that if not handled correctly will be interpreted as an actual nuc i think. PoseEdit passes the secondary struct with a & after mutlifold to this logic and you can see that it does not handle & or care... I am unsure how to address this properly but I think this may be pervasive so probably need to massage the data before it leaves multifold. Maybe remove the extra -1 from position 10 for the oligo with dot-paren lab

for (let ii = 0; ii < bestPairs.length; ii++) { if (lastBestPairs.pairingPartner(ii) === bestPairs.pairingPartner(ii)) { pairsDiff[ii] = 0; } else if (!bestPairs.isPaired(ii) && lastBestPairs.isPaired(ii)) { pairsDiff[ii] = -1; } else if (bestPairs.pairingPartner(ii) > ii) { if (!lastBestPairs.isPaired(ii)) { pairsDiff[ii] = 1; } else { pairsDiff[ii] = 2; } } else { pairsDiff[ii] = 0; } }

luxaritas commented 2 years ago

The & is consistently handled as an unpaired invisible base throughout the code, until the point at which it passes to Pose2D as a base sequence and separate oligos. Regardless, the logic you're looking at is purely used to determine whether or not to show the "great pairing" effect - doesn't have any impact on energy display

LunarFawn commented 2 years ago

ok. looking closer I wonder then if it might be something with there being 2 node 50's in the cache.node that is filled by the FullFold code. I think that we need to follow that and the answer should be then. I'm curious how pose2D is handling the data. This is teh node array created when folding the oligo and main seq. notice that there are two node 50's... I think that this node data correlates to each nuc pair in the struct and the free energy is the large value divded by 100 so you an store a decimal in a integer array. The Nupack code always will create a -1 and a -2 node no mater what and those are special overall structure energy values I believe. Then you have the node num going form Largest to smallest and each number after teh node in the array is the energy of that node. so the energy of node 58 should be 2.10 and node 57 should be -1.7 energy. Perhaps the number of nodes does not match some check or the indexing is way off somewhere and that is why this causes a problem?

(68) [-2, 409, -1, -140, 58, 210, 57, -170, 56, -110, 55, -100, 54, -140, 53, -180, 52, -90, 51, -180, 50, -210, 50, -110, 49, -170, 48, -230, 47, -90, 46, -110, 45, -50, 44, -230, 43, -200, 42, -140, 41, -40, 40, -100, 39, -90, 38, -230, 10, -20, 9, -230, 8, -90, 7, -180, 6, -290, 5, -210, 4, -90, 3, -180, 2, -340, 1, -200]

Edit: my major point is that we are generating the data I believe but it is not being interpreted correctly by some function somewhere. A note is that the node data may not be correctly formed as I am suspicious of the two 50's in the node list

LunarFawn commented 2 years ago

This is the node list for the longer sequence at the same location as the node list from the long seq and oligo together. what was nodes 39 and 40 are now nodes 50 and 50....

Just main seq: 41, -180, 40, -210, 39, -110, 38, -170, 37, -230,

Oligo and main seq: 51, -180, 50, -210, 50, -110, 49, -170, 48, -230

LunarFawn commented 2 years ago

private scoreTreeRecursive in pose2D takes the nodes data and makes the scores

LunarFawn commented 2 years ago

as a note the code specifically modifies that node 49 to make it a 50 based on the presence and number of cuts durring mutifold score structures

LunarFawn commented 2 years ago

image

LunarFawn commented 2 years ago

See image above for debug screenshot of 3rd step into node list. Each node now has listed a value of zero despite the list of free energy nodes has all non-zero's. It seams to be an issue with how the rootNode gets passed to this function call I think or how the rootNode is built before passing possibly.

LunarFawn commented 2 years ago

I just found this function while looking at the rootNodes and how they are built "private generateScoreNodesRecursive"

LunarFawn commented 2 years ago

RNAtlayout for makeing the nodes for pose2D to consume gets the pairs list of the failed fold routine in multifold function. This function somwehere folds first the oligo and main struct together, then the main struct by itself and then the oligo by sitself. These last two individual runs have their resulting structures smashed together for some reason adn if it is less energy than the oligo and main together it is supposed ot keep otherwise it is supposed ot toss it. I dont think it is gettig tossed....

[ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 93, 92, 91, 90, -1, -1, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, -1, -1, -1, -1, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, -1, -1, -1, -1, 35, 34, 33, 32, -1, -1 ]

LunarFawn commented 2 years ago

scoreTree function in the generateScoreNodes is using the -1 that represents the "&" symbol as part of the normal structure and gets a -1,-1 nuc pair with a -230 energy and the overall energy of the node is 0 despite every node having a value including the -1.-1 nuc. I need to look further to find the actual code that is messing this up but I think I am closer image