ag-csw / LDStreamHMMLearn

1 stars 0 forks source link

Determine relative error of implied time scales and stationary distribution #32

Closed greenTara closed 7 years ago

greenTara commented 7 years ago

In addition to calculating the error of the estimated error of the transition matrix as a whole, we should break down that error into meaningful parts. These include- the implied timescales and the stationary distribution. For the implied timescales, these are a vector of size nstates -1. See the paper for the definition of implied timescale. They should be ordered by increasing size, and then the relative error calculated in comparison to the implied timescales of the model, and average over the nstates - 1 values.

If we don't have it already, there should be a method to calculate the implied timescales of a transition matrix.

For the stationary distribution, it is the row eigenvector corre sponding to the eigenvector of value 1, which should be in the first row of the .transu matrix. This vector should sum to 1 ( if not, we need to fix that - let's make it a unit test for samples from an mm family.) Calculate the Euclidean norm of the difference between the estimated and actual stationary distribution (square root of sum of squares).

Let's display these as heat maps for the stationary case - one for the timescale error and one for the stationary distribution error.

alexlafleur commented 7 years ago

I don't know how to proceed in this issue. Please provide me a pseudocode on how to calculate the implied timescales.

"Calculate the Euclidean norm of the difference between the estimated and actual stationary distribution.." actual stationary distribution = first row of sample_basis result? estimated (error?) = the error that we calculated so far (using the count matrix and np.linalg.norm)?

greenTara commented 7 years ago

For a Spectral_MM,

ev = np.diag(self.transD)

produces a vector of the eigenvalues. You want to throw away the first one (should have value = 1) and calculate

impliedtimescale[i-1] = -lag / ln(ev[i])

(ln is natural logarithm) for i = 1, 2, ..., nstates -1

We don't have a field for lag in the Spectral_MM, so for creating a method that returns the implied timescale, it needs to pass the lag as a parameter.

alexlafleur commented 7 years ago

Issue #34 is following up on that, so this issue might be closed?

greenTara commented 7 years ago

Superceded by #34