biocore / mds-approximations

Multidimensional scaling algorithms for microbiology-ecology datasets.
6 stars 7 forks source link

Handle edge case of eigsh not converging #40

Closed HannesHolste closed 6 years ago

HannesHolste commented 6 years ago

Problem: running pcoa using eigsh (reducing to 3 dimensions) on a subsampled matrix of a randomly generated distance matrix may or may not throw an exception because not all eigenvectors converge. So far this issue only occurred on one dataset – the subsampled, randomly generated one – but the new benchmarks are still running on the cluster so there may be more. (the problem matrix is one that was generated through skbio randdm function, original dimension 4096, subsampled to 3072 dimensions)

Since eigsh is an unlikely candidate for the final pcoa method we choose, for now, I decided to simply catch the exception and move on with benchmarks, but the results may not be 'correct', i.e. eigenvectors may be missing.

What are your thoughts @wasade @antgonza ?

  File "/home/-/mds-approximations/mdsa/pcoa.py", line 87, in pcoa
    eigenvectors, eigenvalues = algorithm.run(centered_dm, num_dimensions_out)
  File "/home/-/mds-approximations/mdsa/algorithms/eigsh.py", line 18, in run
    k=num_dimensions_out)
  File "/home/-/miniconda2/envs/mdsapprox/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1589, in eigsh
    params.iterate()
  File "/home/-/miniconda2/envs/mdsapprox/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 561, in iterate
    self._raise_no_convergence()
  File "/home/-/miniconda2/envs/mdsapprox/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 367, in _raise_no_convergence
    raise ArpackNoConvergence(msg % (num_iter, k_ok, self.k), ev, vec)

scipy.sparse.linalg.eigen.arpack.arpack.ArpackNoConvergence: ARPACK error -1: No convergence (30721 iterations, 2/3 eigenvectors converged)

Some related discussions:

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.2%) to 86.745% when pulling f9fdf330ee7f66a96c83cc54b323578b14c64d24 on eigsh-fix into 6a38b9d21bc97320c3d7ad4987951299ea306eb9 on master.

HannesHolste commented 6 years ago

@antgonza: that was very insightful, thank you. I configured eigsh to use shift-inter mode and re-ran the failed PCoA benchmarks successfully, with no convergence error.

But just in case, I added some extra code to deal with convergence errors in this PR: it will impute missing eigenvals and vecs as NaN. Unit test added too.

Please review and merge if OK.