Closed HanatoK closed 1 month ago
@giacomofiorin After digging into I still don't understand how distanceDir
should work. The dist2_lgrad
and dist2_rgrad
do not multiply by 2.0. Why?
cvm::real colvar::distance_dir::dist2(colvarvalue const &x1,
colvarvalue const &x2) const
{
return (x1.rvector_value - x2.rvector_value).norm2();
}
colvarvalue colvar::distance_dir::dist2_lgrad(colvarvalue const &x1,
colvarvalue const &x2) const
{
return colvarvalue((x1.rvector_value - x2.rvector_value), colvarvalue::type_unit3vectorderiv);
}
colvarvalue colvar::distance_dir::dist2_rgrad(colvarvalue const &x1,
colvarvalue const &x2) const
{
return colvarvalue((x2.rvector_value - x1.rvector_value), colvarvalue::type_unit3vectorderiv);
}
If they are multiplied by 2.0, then -force_tang / dist_v.norm()
and force_tang / dist_v.norm()
should be used for group1 and group2 forces, respectively. You can try the following pytorch script and compare what Colvars does:
#!/usr/bin/env python3
import torch
group1 = torch.tensor([1, 2, 3], dtype=torch.float64, requires_grad=True)
group2 = torch.tensor([-1, -2, -3], dtype=torch.float64, requires_grad=True)
dist_v = group2 - group1
dist_dir = dist_v / torch.linalg.norm(dist_v)
center = torch.tensor([-0.5, 1.0, 0.6], dtype=torch.float64)
center /= torch.linalg.norm(center)
# Force from pytorch (-dV/dx)
energy = 0.5 * 10.0 * torch.sum(torch.square(dist_dir - center))
energy.backward()
print('force1:\n', -group1.grad)
print('force2:\n', -group2.grad)
# What Colvars does
cv_force = -10.0 * (dist_dir - center)
iprod = torch.sum(cv_force * dist_dir)
force_tang = cv_force - iprod * dist_dir
print('force_tang1:\n', -force_tang)
print('force_tang2:\n', force_tang)
# These should match force1 and force2
print('force_tang1 / dist_v.norm():\n', -force_tang / (torch.linalg.norm(dist_v)))
print('force_tang2 / dist_v.norm():\n', force_tang / (torch.linalg.norm(dist_v)))
@giacomofiorin I think the force projection on distanceDir can be illustrated as follows:
force_tang
is actually the force on the original vector $\vec{v}$, and to project the force further to $\vec{v}/|\vec{v}|$, you have to scale the force too.
Hi @giacomofiorin and @jhenin , I am trying to fix this issue but then I just find I may misunderstand what unit vectors in Colvars are. According to the code in colvarvalue.h
:
https://github.com/Colvars/colvars/blob/e1220bd61c3508eaac7f9d44bd4e16e95aed2142/src/colvarvalue.h#L738-L741
The distance between two unit 3D vectors are the angle between them, but I don't know why it is defined in such way. In general, people may expect that two unit 3D vectors are just two 3D vectors, and the distance between them are
$|\mathbf{u}_1 - \mathbf{u}_2|$.
If the Colvars defines the unit vector in a different definition that is uncommon, then it should be documented to avoid people misunderstand what distanceDir
is.
Hi @giacomofiorin and @jhenin , I am trying to fix this issue but then I just find I may misunderstand what unit vectors in Colvars are. According to the code in
colvarvalue.h
:The distance between two unit 3D vectors are the angle between them, but I don't know why it is defined in such way. In general, people may expect that two unit 3D vectors are just two 3D vectors, and the distance between them are | u 1 − u 2 | .
If the Colvars defines the unit vector in a different definition that is uncommon, then it should be documented to avoid people misunderstand what
distanceDir
is.
Should I conclude that the Colvars definition of distance between two unit 3D vectors is the square of the angles between them? Because then the derivative dist2_grad
in colvarvalue.cpp
doesn't make sense to me:
https://github.com/Colvars/colvars/blob/e1220bd61c3508eaac7f9d44bd4e16e95aed2142/src/colvarvalue.cpp#L603-L610
I compute the gradient with respect to x2
as follows:
$$ d^2 = \arccos^2 \left(\mathbf{u}_1\cdot\mathbf{u}_2\right) $$
$$ \begin{split} \frac{\partial d^2}{\partial \mathbf{u}_2}&=\frac{\partial}{\partial \mathbf{u}_2} \arccos^2 \left(\mathbf{u}_1\cdot\mathbf{u}_2\right)\ &=2\arccos \left(\mathbf{u}_1\cdot\mathbf{u}_2\right) \left(-\frac{1}{\sqrt{1-(\mathbf{u}_1\cdot\mathbf{u}_2)^2}}\right)\frac{\partial (\mathbf{u}_1\cdot\mathbf{u}_2)}{\partial \mathbf{u}_2}\ &=2\arccos \left(\mathbf{u}_1\cdot\mathbf{u}_2\right) \left(-\frac{1}{\sqrt{1-(\mathbf{u}_1\cdot\mathbf{u}_2)^2}}\right)\mathbf{u}_1 \end{split} $$
I write the above formula in python and compare it with what Colvars does and the pytorch symbolic gradient:
#!/usr/bin/env python3
import torch
import numpy as np
np.random.seed(1)
# Two random vectors
v1 = np.random.normal(size=(3))
v2 = np.random.normal(size=(3))
# Two unit vectors in pytorch
u1 = torch.tensor(v1 / np.linalg.norm(v1), requires_grad=True)
u2 = torch.tensor(v2 / np.linalg.norm(v2), requires_grad=True)
print("u1 = ", u1, " u1 norm = ", u1.norm())
print("u2 = ", u2, " u2 norm = ", u2.norm())
# How Colvars calculates dist2
dist2 = torch.acos(u1.dot(u2)) * torch.acos(u1.dot(u2))
# Reference symbolic gradient
dist2.backward(retain_graph=True)
print("Reference:\n", u2.grad)
# My derivation of dist2_grad
def dist2_grad_me(u1, u2):
dot_prod = u1.dot(u2)
return 2.0 * torch.acos(dot_prod) * (-1.0 / (torch.sqrt(1 - dot_prod * dot_prod))) * u1
print("My derivation of dist2_grad:\n", dist2_grad_me(u1, u2))
# How Colvars calculates dist2_grad
def dist2_grad_colvars(u1, u2):
cos_t = u1.dot(u2)
return 2.0 * (cos_t * u1 - u2)
print("Colvars implementation of dist2_grad:\n", dist2_grad_colvars(u1, u2))
# Colvars 2013
def dist2_grad_colvars_2013(u1, u2):
cos_t = u1.dot(u2)
sin_t = torch.sqrt(1.0 - cos_t * cos_t)
dx = 2.0 * sin_t * ((-1.0) * sin_t * u2[0] + cos_t / sin_t * (u1[0] - cos_t * u2[0]))
dy = 2.0 * sin_t * ((-1.0) * sin_t * u2[1] + cos_t / sin_t * (u1[1] - cos_t * u2[1]))
dz = 2.0 * sin_t * ((-1.0) * sin_t * u2[2] + cos_t / sin_t * (u1[2] - cos_t * u2[2]))
return torch.stack((dx, dy, dz))
print("Colvars (2013) implementation of dist2_grad:\n", dist2_grad_colvars_2013(u1, u2))
Running the code above, I get:
u1 = tensor([ 0.8953, -0.3372, -0.2911], dtype=torch.float64, requires_grad=True) u1 norm = tensor(1.0000, dtype=torch.float64, grad_fn=<LinalgVectorNormBackward0>)
u2 = tensor([-0.3999, 0.3226, -0.8579], dtype=torch.float64, requires_grad=True) u2 norm = tensor(1.0000, dtype=torch.float64, grad_fn=<LinalgVectorNormBackward0>)
Reference:
tensor([-3.2828, 1.2364, 1.0674], dtype=torch.float64)
My derivation of dist2_grad:
tensor([-3.2828, 1.2364, 1.0674], dtype=torch.float64,
grad_fn=<MulBackward0>)
Colvars implementation of dist2_grad:
tensor([ 0.4112, -0.4988, 1.8422], dtype=torch.float64,
grad_fn=<MulBackward0>)
Colvars (2013) implementation of dist2_grad:
tensor([ 0.4112, -0.4988, 1.8422], dtype=torch.float64,
grad_fn=<StackBackward0>)
So I have no idea about what Colvars really want to compute. @giacomofiorin and @jhenin Do you have any ideas?
Update: I have just checked out the initial Colvars code (in https://github.com/Colvars/colvars/blob/0c530e2df0a7658476aaecb3fb578180199b3db5/src/colvarvalue.h#L674-L681) but that is still not what I expect...
Hi @HanatoK! I think that your conclusion about the gradients is correct, i.e. they are missing a normalization.
This makes sense by observing that the force applied onto a distanceDir
CV is a torque. Therefore, when the distance between group1 and group2 increases, but the torque remains the same, the force applied onto group1 and group2 should decrease accordingly.
I looked at very old local code snapshots, preceding the initial commit in this repo. Unlike for the quaternion bug (https://github.com/Colvars/colvars/pull/713#issuecomment-2346452299) I could not find a specific commit that introduced the error, or notes detailing the formal derivation. However, redoing the derivation anew and checking the physical dimensions of the force being applied to the centers of mass groups confirms what you found.
(The current code takes a number that has the unit of energy or torque and applies it as a force: your correction divides it by a length, thus recovering a unit of force.)
This is a good example of the fact that regression tests alone cannot prevent all bugs. This is especially true for very old code that was also used less frequently... Neither @jhenin or I came across a project with a strong motivation to compute a PMF along on this kind of variable, which would have required implementing grids on the unit sphere.
As you also found later, the distance between two distanceDir
values is the angle between them (geodesic distance between two points on Riemannian manifold). This could be documented for clarity, along with double-checking the gradient of that angle. That said, I would also be in favor of adding a component that allows that angle to be used directly as a CV?
distanceDir
Originally the distance between two distanceDir
should be the angle, but later in commit 604bc14946f8f0e495a4f08ee13ec4081addd4a8 you added dist2
for it, which is $||\mathbf{v}_1-\mathbf{v}_2||^2$, but then the dist2_lgrad
and dist2_rgrad
didn't have the factor 2.0. I could only guess that you found something wrong and wanted to fix it, but I could not infer any details (what you actually encountered and wanted to fix) from the commit itself...
distanceDir
Originally the distance between two
distanceDir
should be the angle, but later in commit 604bc14 you addeddist2
for it, which is | | v 1 − v 2 | | 2 , but then thedist2_lgrad
anddist2_rgrad
didn't have the factor 2.0. I could only guess that you found something wrong and wanted to fix it, but I could not infer any details (what you actually encountered and wanted to fix) from the commit itself...
This was probably related to a different bug, reported by JC Gumbart and Anthony Hazel. The bug made the distance_dir function revert to the metric functions of the base class, which are defined for scalar numbers and therefore the code would crash like this:
colvars: Trying to assign a colvar value with type "scalar number" to one with type "3-dimensional unit vector"
Clearly, my fix was done in too much of a hurry.
distanceDir
Originally the distance between two
distanceDir
should be the angle, but later in commit 604bc14 you addeddist2
for it, which is | | v 1 − v 2 | | 2 , but then thedist2_lgrad
anddist2_rgrad
didn't have the factor 2.0. I could only guess that you found something wrong and wanted to fix it, but I could not infer any details (what you actually encountered and wanted to fix) from the commit itself...This was probably related to a different bug, reported by JC Gumbart and Anthony Hazel. The bug made the distance_dir function revert to the metric functions of the base class, which are defined for scalar numbers and therefore the code would crash like this:
colvars: Trying to assign a colvar value with type "scalar number" to one with type "3-dimensional unit vector"
Clearly, my fix was done in too much of a hurry.
Oh, I see! I saw them at the NAMD Workshop in Dalian in 2017. I asked them about the GB1 system in their paper (https://doi.org/10.1063/1.5025951), and wanted to use the same system and CHARM22* with meta-eABF to show the strength of meta-eABF (unfortunately I failed so it was not appeared in the paper). During the workshop they told me that they kept the reversible unfolding of GB1 only along the z direction, so that they could reduce the size of the water box, and then made the simulation faster since the number of atoms was smaller. To do this they used a distanceDir
CV between the end-to-end distance, and restrained it to the z-axis (the following was what I used but I believe they used something similar):
colvar {
name restraintZ
distanceDir {
group1 {atomNumbers {9}}
group2 {atomNumbers {238}}
}
}
harmonic {
colvars restraintZ
outputEnergy on
forceConstant 5.0
centers (0.0, 0.0, 1.0)
}
If the force is incorrect then the dynamics may be affected, though.
Oh, I see! I saw them at the NAMD Workshop in Dalian in 2017. I asked them about the GB1 system in their paper (https://doi.org/10.1063/1.5025951), and wanted to use the same system and CHARM22* with meta-eABF to show the strength of meta-eABF (unfortunately I failed so it was not appeared in the paper). During the workshop they told me that they kept the reversible unfolding of GB1 only along the z direction, so that they could reduce the size of the water box, and then made the simulation faster since the number of atoms was smaller. To do this they used a
distanceDir
CV between the end-to-end distance, and restrained it to the z-axis (the following was what I used but I believe they used something similar):...
If the force is incorrect then the dynamics may be affected, though.
Ah, good to know where they used this feature. I had previously thought that they used in that work a distanceXY
variable between the two chain ends (i.e. force their XY coordinates to match each other).
If the distance_dir
gradients were lacking the division by the length (per your correction) then the restraint forces along the XY plane would have maintained their original magnitude (i.e. without decreasing) as the peptide chain unfolded. So essentially the effect of the restraint would have been very similar to that of distanceXY
after all?
Oh, I see! I saw them at the NAMD Workshop in Dalian in 2017. I asked them about the GB1 system in their paper (https://doi.org/10.1063/1.5025951), and wanted to use the same system and CHARM22* with meta-eABF to show the strength of meta-eABF (unfortunately I failed so it was not appeared in the paper). During the workshop they told me that they kept the reversible unfolding of GB1 only along the z direction, so that they could reduce the size of the water box, and then made the simulation faster since the number of atoms was smaller. To do this they used a
distanceDir
CV between the end-to-end distance, and restrained it to the z-axis (the following was what I used but I believe they used something similar):...
If the force is incorrect then the dynamics may be affected, though.
Ah, good to know where they used this feature. I had previously thought that they used in that work a
distanceXY
variable between the two chain ends (i.e. force their XY coordinates to match each other).If the
distance_dir
gradients were lacking the division by the length (per your correction) then the restraint forces along the XY plane would have maintained their original magnitude (i.e. without decreasing) as the peptide chain unfolded. So essentially the effect of the restraint would have been very similar to that ofdistanceXY
after all?
I am not sure. What they did was using distance as a CV for the PMF calculation, and then restraining the direction of the same distance vector to z axis.
@HanatoK The description of this issue contains an extra 1/2 at the denominator compared to the one that matches the PyTorch automatic differentiation right after (link).
The latter is what I get if I differentiate independently: (the second term can be neglected if it is multiplied by a "force" on the unit vector, which is tangential to the unit sphere and thus orthogonal to the vector itself). Does it check out for you?
@HanatoK The description of this issue contains an extra 1/2 at the denominator compared to the one that matches the PyTorch automatic differentiation right after (link).
The latter is what I get if I differentiate independently: (the second term can be neglected if it is multiplied by a "force" on the unit vector, which is tangential to the unit sphere and thus orthogonal to the vector itself). Does it check out for you?
Actually there are two slightly different issues. The first one is the length of the vector missing in the denominator. The second one is the 2.0 factor missing in the dist2_lgrad
and dist2_rgrad
functions. Originally I only noticed the first one, so I thought there should be a 0.5 in the denominator, but latter I noticed the missing 2.0 in the dist2_lgrad
and dist2_rgrad
functions, so 0.5 is not necessary.
I think your derivation and the fix in https://github.com/Colvars/colvars/pull/724 should be correct.
While testing https://github.com/Colvars/colvars/pull/713 I find that the forces in
distance_dir::apply_force
seems incorrect (I compare it to pytorch). The numerically correct results seems to beI know the forces are calculated by some sorts of projection. I don't know why the modified code above gives me the correct numerical results, but the missing of
dist_v.norm()
in the function looks a bit strange. I need some time to figure it out.