UC-Davis-molecular-computing / scadnano-python-package

Python scripting library for generating designs readable by scadnano.
https://scadnano.org
MIT License
13 stars 7 forks source link

automatically set Helix rolls based on crossover locations ("relax" the rolls) #257

Closed dave-doty closed 1 year ago

dave-doty commented 1 year ago

Unlike "set helix coordinates based on crossovers", this would not adjust helix positions, only their rolls.

Keep each helix in the same position, but adjust the roll to minimize "something" about the strain. One way to do this is to calculate how far is each crossover from being perfect, e.g., if a crossover goes from helix i to j, but instead of the backbone angle pointing straight at j, it is 20 degrees off, then the "error" is 20 degrees. We could minimize the sum of squared errors for each crossover, for instance. This should be a straightforward linear algebra calculation to compute this roll for each helix.

See also https://github.com/UC-Davis-molecular-computing/scadnano/issues/843

dave-doty commented 1 year ago

After some discussions, I think the correct thing to do here is the following:

We can model the various crossover angles as unit vectors. If all of them are on one side of a line (e.g., all with a positive x-coordinate), then take their average, interpret it as an angle, and subtract it from all others.

If the n vectors do not lie on one side of a line, there's a more complex algorithm that involves measuring an average for each of n choices... need to think about this more.

JoshPetrack commented 1 year ago

Suppose you have crossovers at angles ${\theta_i}$ and let $x$ be the angle you're going to adjust the helix's roll by. The goal is to find x that minimizes the sum of squared errors,

$$f(x) = \sum_{i} (x - \theta_i)^2$$

where $(x - \theta_i)$ is measured to always be at most 180 degrees.

$f(x)$ is differentiable everywhere except at points that are directly opposite a crossover - that is, points that look like $\theta_i + 180\degree$. if you restrict the domain of f(x) to remove these points, you get a set of intervals on which it's differentiable, so you can minimize $f(x)$ on each of these intervals separately by setting $f'(x) = 0$ and solving, then choose the interval giving the smallest local minimum (i think these intervals will always have a unique local minimum but maybe not).

I believe (though it should be verified) that if you do the math, finding the local minimum on one of these intervals is equivalent to the following process:

1) Cut the circle at the point opposite one of the $\theta_i$ 2) Treat the cut circle as an interval of real numbers, and simply take the average of all the points on the interval.

dave-doty commented 1 year ago

I wanted to justify a bit why minimizing the sum of squared errors is the right thing to do to minimize crossover strain. Two arguments seem to arrive at the same conclusion, which makes me more confident in it.

In both cases, we model the crossovers connecting the current helix H1 another helix H2 as a torsion (rotional) spring. The zero angle is defined to be the angle where H2 is relative to H1. (In reality there are crossovers to more than one other helix, which are at different angles; to account for this define their "0" angle to be the angle at which the other helix is). We can set the whole helix at some angle; this in turn strains each of the springs by an amount proportional to their angle after the twisting. The displacement $d_c$ of crossover $c$ is defined to be this angle (more precisely it is a nonnegative number between 0 and 180, so unlike a linear spring or conventional torsion spring, as you twist it past 180, the displacement then lowers back towards 0).

  1. Minimum Energy argument: The energy stored in a spring with displacement $d_c$ is proportional to $d_c^2$. So the total energy is minimized by minimizing $\sum_c d_c^2$.

  2. Balanced Forces argument: The helix will want to twist in the direction of net angular forces imparted by the various crossovers. So it will stop twisting when this net angular force is 0. TODO: finish this argument.

Raj-Bapat commented 1 year ago

Hi Dr. Doty and Josh, I had to do some angle manipulation in order to split the circle, get the correct resulting angles, and use the shortest distances across the circle. Im also making the assumption that if θ is in A, then the point directly opposite θ is not in A (θ-180). Here is the pseudocode for the algorithm described in Josh's, as I understand it:

// A is an array containing angles, there are n angles
// the angle with the minimum squared distance is minX, the minimum squared distance itself is minSquaredDist

minX = inf 
minSquaredDist = inf
for i from 1...n
    converted[1..n] = array of size n; 
    for j from 1...n
        newAngle = A[j]-A[i]
        if newAngle < 0 // normalizing the angle to [0, 360)
            newAngle+=360
        if newAngle > 180 // left side of the number line --> we adjust the distance and make it negative
            newAngle-=360
        converted[j] = newAngle
    X = 0
    squaredDist = 0
    for j from 1...n // finding the average angle for the current domain
        X+=converted[j]
    X/=n // X is now the average angle (on the orientation of the number line)
    X+=A[i] // X is now the average angle on the standard orientation
    if X < 0 // normalizing the angle to [0, 360)
        X+=360
    if X >= 360 
        X-=360
    for j from 1...n // finding the squared distance
        dist1 = X-A[j]
        if dist1 < 0 // normalizing to [0, 360)
            dist1+=360
        dist2 = A[j]-X
        if dist2 < 0
            dist2+=360
        squaredDist+=min(dist1, dist2)*min(dist1, dist2) 
    if squaredDist < minSquaredDist
        minSquaredDist = squaredDist
        minX = X

I'd be happy to explain in detail how it works. Quite a bit of it is just adjusting angle values (+=360) to make the computations accurate, which could make this confusing. Given some tests I could write it in c++ and test this out. Also note that the correctness of this approach also depends on if Josh's algorithm is correct :P.