bhmm / legacy-bhmm-force-spectroscopy-manuscript

Bayesian hidden Markov models for analysis of single-molecule trajectory data
GNU Lesser General Public License v3.0
2 stars 3 forks source link

Issue with stationary / initial distributions #37

Closed jchodera closed 9 years ago

jchodera commented 9 years ago

There seems to be an issue with the machinery added to handle initial / stationary distributions of HMM models.

When examining the force spectroscopy test system, the stationary and initial distributions are reported to be uniform, whereas the transition matrix should be producing a highly non-uniform stationary probability:

>>> import bhmm.util
f>>>from bhmm.util import testsystems
>>> hmm = testsystems.force_spectroscopy_model()
hmm.Pi
>>> hmm.Pi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'HMM' object has no attribute 'Pi'
>>> hmm.stationary_distribution
array([ 0.33333333,  0.33333333,  0.33333333])
>>> hmm.initial_distribution
array([ 0.33333333,  0.33333333,  0.33333333])

Any idea what is going on here?

jchodera commented 9 years ago

Seems that this is due to a malfunction in pyemma.msm.stationary_distribution():

>>> from pyemma.msm import analysis as msmana
>>> msmana.stationary_distribution(hmm._Tij)
array([ 0.33333333,  0.33333333,  0.33333333])
>>> print hmm._Tij
[[ 0.989  0.01   0.001]
 [ 0.01   0.94   0.05 ]
 [ 0.001  0.05   0.949]]

The transition matrix is row-stochastic; could pyemma be assuming it is column-stochastic?

jchodera commented 9 years ago

The stationary eigenvector should be something like this:

>>> np.linalg.eig(hmm._Tij)
(array([ 1.        ,  0.98417743,  0.89382257]), array([[ 0.57735027,  0.81343001,  0.0706985 ],
       [ 0.57735027, -0.34548831, -0.73980031],
       [ 0.57735027, -0.4679417 ,  0.66910181]]))
>>> [eigs, eigv] = np.linalg.eig(hmm._Tij)
>>> eigv[0,:]
array([ 0.57735027,  0.81343001,  0.0706985 ])
>>> eigv[0,:] / eigv[0,:].sum()
array([ 0.39504526,  0.55658011,  0.04837463])
jchodera commented 9 years ago

This is using pyemma version 1.2:

>>> print pyemma.__version__
1.2
franknoe commented 9 years ago

No, if you want the stationary dist you have to compute the left eigenvalues, so you have to compute np.linalg.eig on the transposed matrix (which here doesn't matter because the matrix is symmetric), and then get the first column

np.linalg.eig(hmm._Tij.T)[:,0]

which is indeed 1/3, 1/3, 1/3 for this example. It's also obvious because the matrix is symmetric.

Here's the proof:

Am 18/05/15 um 01:47 schrieb John Chodera:

The stationary eigenvector should be something like this:

|>>> np.linalg.eig(hmm._Tij) (array([ 1. , 0.98417743, 0.89382257]), array([[ 0.57735027, 0.81343001, 0.0706985 ], [ 0.57735027, -0.34548831, -0.73980031], [ 0.57735027, -0.4679417 , 0.66910181]]))

[eigs, eigv] = np.linalg.eig(hmm._Tij) eigv[0,:] array([ 0.57735027, 0.81343001, 0.0706985 ]) eigv[0,:] / eigv[0,:].sum() array([ 0.39504526, 0.55658011, 0.04837463]) |

— Reply to this email directly or view it on GitHub https://github.com/bhmm/bhmm/issues/37#issuecomment-102871009.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

Ah! I sliced up the eigenvector matrix incorrectly. And you're right, I should have recognized that the matrix was symmetric!