Convergence problems with few atoms

kjelljorner commented 4 years ago

Great project! I'm currently incorporating spyrmsd in my conformer generation code. Everything is fine except when there are fewer number of heavy atoms when I seem to run into some convergence problems related to the `minimize=True' option for rmsd and symmrmsd. I have managed to create a reproducible example.


import spyrmsd
from spyrmsd import rmsd
import numpy as np
elements = np.array([6, 1, 1, 1, 1])
connectivity_matrix = np.array([[0, 1, 1, 1, 1],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0]])
coordinates_1 = np.array([[ 2.416690358222e-09, -2.979634307864e-08, -9.782357562900e-09],
       [-6.669665490935e-01, -6.162099127861e-01, -6.069106091317e-01],
       [-4.258088724928e-01,  9.999808728518e-01,  1.078170393201e-01],
       [ 1.178459756093e-01, -4.538168967035e-01,  9.864390766805e-01],
       [ 9.749294435604e-01,  7.004596643409e-02, -4.873454970866e-01]])
coordinates_2 = np.array([[-2.118450971480e-07,  2.238951108509e-07,  1.839989120690e-07],
       [-5.297519571039e-01, -4.011375110922e-01,  8.668054003529e-01],
       [-5.107749001064e-01,  8.975573096842e-01, -3.555275589573e-01],
       [ 1.644944812511e-02, -7.486078704316e-01, -7.951194721576e-01],
       [ 1.024077620930e+00,  2.521878479445e-01,  2.838414467631e-01]])
rmsd = spyrmsd.rmsd.symmrmsd(
            coordinates_1,
            coordinates_2,
            elements,
            elements,
            connectivity_matrix,
            connectivity_matrix,
            center=True, minimize=True
        )
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-45ddca0a9f55> in <module>
      6             connectivity_matrix,
      7             connectivity_matrix,
----> 8             center=True, minimize=True
      9         )

~/miniconda3/envs/xtb/lib/python3.7/site-packages/spyrmsd/rmsd.py in symmrmsd(coordsref, coords, atomicnumsref, atomicnums, amref, am, center, minimize, cache)
    283             center=center,
    284             minimize=minimize,
--> 285             isomorphisms=None,
    286         )
    287 

~/miniconda3/envs/xtb/lib/python3.7/site-packages/spyrmsd/rmsd.py in _rmsd_isomorphic_core(coords1, coords2, atomicnums1, atomicnums2, am1, am2, center, minimize, isomorphisms)
    187         else:
    188             # Compute minimized RMSD using QCP
--> 189             result = qcp.qcp_rmsd(c1i, c2i)
    190 
    191         min_result = result if result < min_result else min_result

~/miniconda3/envs/xtb/lib/python3.7/site-packages/spyrmsd/qcp.py in qcp_rmsd(A, B)
    248     c2, c1, c0 = coefficients(M, K)
    249 
--> 250     l_max = lambda_max(Ga, Gb, c2, c1, c0)
    251 
    252     s = Ga + Gb - 2 * l_max

~/miniconda3/envs/xtb/lib/python3.7/site-packages/spyrmsd/qcp.py in lambda_max(Ga, Gb, c2, c1, c0)
    199     x0 = (Ga + Gb) / 2.0
    200 
--> 201     return optimize.newton(P, x0, fprime=dP)
    202 
    203 

~/miniconda3/envs/xtb/lib/python3.7/site-packages/scipy/optimize/zeros.py in newton(func, x0, fprime, args, tol, maxiter, fprime2, x1, rtol, full_output, disp)
    359         msg = ("Failed to converge after %d iterations, value is %s."
    360                % (itr + 1, p))
--> 361         raise RuntimeError(msg)
    362 
    363     return _results_select(full_output, (p, funcalls, itr + 1, _ECONVERR))

RuntimeError: Failed to converge after 50 iterations, value is 1.5905174512308715.

RMeli commented 4 years ago

Hi @kjelljorner, thank you for reporting this issue! Unfortunately I can't reproduce it locally; I added your snippet to the test suite and I do get an RMSD of 0.0 and no RuntimeError.

Which version of numpy and scipy are you using? I tested it with a fresh Python 3.7 environment using numpy 1.19.2 and scipy 1.5.2 (and with Python 3.8 using numpy 1.18.1 and scipy 1.4.1).

RMeli commented 4 years ago

Travis-CI Build #740 does pick up the problem, inconsistently. Interestingly, Job #740.7 on macOS fails with Python 3.7, numpy 1.19.2 and scipy 1.5.2 (which is the configuration I used locally).

kjelljorner commented 4 years ago

My setup seems fairly similar to yours. Installed spyrmsd=0.3.5 from conda-forge on Windows Subsystem Linux (Ubuntu). Get the same result with running the code on Windows.

import sys
print(sys.version)
3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0]
import scipy
print(scipy.__version__)
1.5.2
numpy.__version__
import numpy
print(numpy.__version__)
1.19.1

RMeli commented 4 years ago

This looks like a floating point error. I managed to reproduce it locally by shifting the initial guess for the Newton method by 1e-6 to the right: x0 = (Ga + Gb) / 2.0 + 1e-6 rises the RuntimeError: Failed to converge after 50 iterations, value is 1.5905195794288944. exception.

I tried using different shifts to the left but this was not solving the problem consistently (a few more CI builds passed, but not all of them). I now added the Halley’s method as a fallback; this seems to work more robustly without changing the initial guess (all CI builds pass now, Build #746). The fix (?) is available on the issue35 branch; @kjelljorner does this appear robust for your use cases as well?

kjelljorner commented 4 years ago

I tried it in my workflow and it works for the specific example that I gave above, but I quickly found another example where it fails again. It only appeared when I used 16 decimals for the numpy arrays. My use case is to prune conformers from a conformer generator which has quite a lot of redundancy so RMSD is really the only way to prune the ensemble and you don't know beforehand it two molecules are almost identical or not.

import spyrmsd
from spyrmsd import rmsd
import numpy as np
elements = np.array([6, 6, 1, 1, 1, 1, 1, 1])
connectivity_matrix = np.array([[0, 1, 1, 1, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 1, 1, 1],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0]])
coordinates_1 = np.array([[-0.7513646148641476,  0.0212153090068301,  0.0811236469164399],
       [ 0.7513646329421021, -0.0212139873589844, -0.0811234544788992],
       [-1.1912097695762844, -0.9520810297276773, -0.1560519813078418],
       [-1.0197601491429782,  0.2790338755490671,  1.1099648032811764],
       [-1.1891251185534542,  0.769045487934088 , -0.5868099187409093],
       [ 1.1891224079259752, -0.7690462686978631,  0.5868095572711365],
       [ 1.0197615786233758, -0.2790344505405097, -1.1099636171164908],
       [ 1.191211032645396 ,  0.9520810638350493,  0.1560509641753798]])
coordinates_2 = np.array([[ 0.752989230839992 ,  0.0675001892809206, -0.0055472904918074],
       [-0.7529892426611152, -0.0675008847728476,  0.0055489067798057],
       [ 1.0506759954349485,  1.0138756198991818, -0.4668239152871582],
       [ 1.2080371182091196, -0.7501873566674166, -0.5724157781772267],
       [ 1.148732484392816 ,  0.0417675099494674,  1.0141322717252037],
       [-1.050678518706957 , -1.0138763219763327,  0.4668256362472835],
       [-1.148731454954795 , -0.0417698443724041, -1.0141315673229787],
       [-1.208035612554004 ,  0.7501910886594293,  0.5724117365268774]])
mask = elements != 1
rmsd = spyrmsd.rmsd.symmrmsd(
            coordinates_1[mask],
            coordinates_2[mask],
            elements[mask],
            elements[mask],
            connectivity_matrix[mask,:][:,mask],
            connectivity_matrix[mask,:][:,mask],
            center=True, minimize=True
        )

RMeli commented 4 years ago

Thank you for reporting this additional problematic case @kjelljorner ! Luckily, I can reproduce this one locally (and it fails consistently in CI as well, Build #749), which will facilitate debugging.

It only appeared when I used 16 decimals for the numpy arrays.

This seems to confirm my fears. I'm not sure what is happening with scipy.optimize.newton, but it does not appear to be very robust. I tried to use scipy.optimize.fsolve instead (different algorithm, from MINPACK) and all tests pass locally and in CI (Build #750). Would you mind trying this out (from branch issue35) and seeing if it is actually more robust on your use cases?

If it isn't more robust, I'll investigate further now that I can reproduce the problem locally. Thank you for bearing with me.

kjelljorner commented 4 years ago

Seems much more stable now. Haven't been able to reproduce the previous error. However, there is a numpy warning that seems to stem from very small negative values of s. Sometimes it seems to lead to a freeze.

RuntimeWarning: invalid value encountered in sqrt
  rmsd = np.sqrt(s / N)

I think updating the cutoff to maybe 10^-10 instead of 10^-12 might be good. Empirically I would get maybe -1.3*10^-12 when I tried some experiments.

    if abs(s) < 1e-12:  # Avoid numerical errors when Ga + Gb = 2 * l_max
        rmsd = 0

RMeli commented 4 years ago

Seems much more stable now. Haven't been able to reproduce the previous error.

I fear the improved robustness is only apparent and due to the fact that scipy.optimize.fsolve does not fail as cleanly as scipy.optimize.newton given that is based on an external library. If I check the convergence explicitly, the first test case fails to converge (now on my local tests as well). As a temporary workaround I reverted to scipy.optimize.newton (which is fast, safe and seemed to work well for most cases) and added a fallback to an explicit calculation of the maximum eigenvalue l_max. This will be slower, but should guarantee correctness (or safe failures with a LinAlgError). I hope this problem does not happen that often. Please let me know if this new fallback works for your use cases (CI builds pass) and if it is not too slow.

I'll keep investigating this issue and try to find a better (and faster) workaround.

However, there is a numpy warning that seems to stem from very small negative values of s.

I updated the cutoff to 1e-9 as default. However I recall I wanted to expose such parameter from the API, so I did that as well; you can now control this parameter explicitly with atol.

kjelljorner commented 4 years ago

It seems to be working now! 👍 I had some problems initially with the result of symmrmsd on multiple coordinates given 0 as a float instead of an ndarray with zero as the only element. but can't seem to reproduce it. I think that my kernel had maybe not updated to the new installation of spyrmsd. Might be worth looking in to anyway if you are aware of any of the changes affecting this behavior.

>>> ce.get_rmsd(method="spyrmsd")
array([[0, array([0.04105192]), array([0.03902591])

RMeli commented 4 years ago

I'm glad to hear it is now stable. I'll add some more checks and create a new release soon. Thank you again for reporting the issue and providing some reproducible examples!

I had some problems initially with the result of symmrmsd on multiple coordinates given 0 as a float instead of an ndarray with zero as the only element.

I'll look at the issue you mention. I think now that the code is more robust for structures that superimpose perfectly the condition if abs(s) < atol: is not hit very often; that's where the 0.0 float was probably coming from. I'll change the code to return the results in a consistent format.

RMeli commented 4 years ago

@kjelljorner FYI, the new fixes are now on the main release (PyPI and conda-forge channel).

RMeli / spyrmsd

Convergence problems with few atoms #35