choderalab / msm-pipeline

A pipeline for MSMs.
GNU Lesser General Public License v3.0
2 stars 5 forks source link

Assertion Error (expecting symmetric matrix) only happening for some MSMs? #12

Open sonyahanson opened 8 years ago

sonyahanson commented 8 years ago

So far this is has only happened in one of the three MSMs I'm building right now:

    super(StreamingTransformer, self).estimate(X, **kwargs)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/_base/estimator.py", line 348, in estimate
    self._model = self._estimate(X)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 297, in _estimate
    self._diagonalize()
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 308, in _diagonalize
    eigenvalues, eigenvectors = eig_corr(self.cov, self.cov_tau, self.epsilon)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/util/linalg.py", line 151, in eig_corr
    assert np.allclose(C0.T, C0), 'C0 is not a symmetric matrix'
AssertionError: C0 is not a symmetric matrix
jchodera commented 8 years ago

Can you report this on the pyemma issue tracker?

sonyahanson commented 7 years ago

Getting this error now for both CK2 and SYK projects even when running on trajectories copied to a local folder...

jchodera commented 7 years ago

Can you paste more of that error? It looks like you've snipped out the part of the stack trace that starts in our code---all of those lines are in pyemma.

sonyahanson commented 7 years ago
Running pipeline
Finding respairs_that_changed...
Number of contacts that changed: 13728
Total number of possible contacts: 53956
calculate mean+cov: 100% (1591/1591) [#############################] eta 00:00 |Traceback (most recent call last):
  File "pipeline.py", line 190, in <module>
    run_pipeline(fnames, project_name = project_name)
  File "pipeline.py", line 101, in run_pipeline
    pipeline = pyemma.coordinates.pipeline(stages, chunksize = 1000)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/api.py", line 435, in pipeline
    p.parametrize()
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/pipelines.py", line 146, in parametrize
    element.estimate(element.data_producer, stride=self.param_stride)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 257, in estimate
    return super(TICA, self).estimate(X, **kwargs)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/transformer.py", line 190, in estimate
    super(StreamingTransformer, self).estimate(X, **kwargs)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/_base/estimator.py", line 348, in estimate
    self._model = self._estimate(X)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 297, in _estimate
    self._diagonalize()
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 308, in _diagonalize
    eigenvalues, eigenvectors = eig_corr(self.cov, self.cov_tau, self.epsilon)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/util/linalg.py", line 151, in eig_corr
    assert np.allclose(C0.T, C0), 'C0 is not a symmetric matrix'
AssertionError: C0 is not a symmetric matrix
jchodera commented 7 years ago

The relevant calling code is just:

    # do featurization + tICA by streaming over size-1000 "chunks"
    source = pyemma.coordinates.source(fnames, features = feat)
    tica = pyemma.coordinates.tica(lag = 10, kinetic_map = True, var_cutoff = 0.95)
    stages = [source, tica]
    pipeline = pyemma.coordinates.pipeline(stages, chunksize = 1000)

I don't think there's any reason to be getting an error like that from our end, though the number of features (13728) is quite large.

Can you report this to the pyemma issue tracker?

maxentile commented 7 years ago

though the number of features (13728) is quite large.

Quick note: this doesn't actually ask PyEMMA to compute a tICA projection using all ~14k contact features, just the top max_respairs=1000 of them.

jchodera commented 7 years ago

Ah, thanks!