deeptime-ml / deeptime

Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation
https://deeptime-ml.github.io/
GNU Lesser General Public License v3.0
754 stars 84 forks source link

[Bug] At least one of the given states is not contained in this model #275

Closed DanielWicz closed 1 year ago

DanielWicz commented 1 year ago

When looking at your code of the method submodel in deeptime/markov/msm/_markov_state_model.py, there is a check of passed states in the line 293: https://github.com/deeptime-ml/deeptime/blob/9a532859fdeabffbdfc713b4aad2c2da852c4917/deeptime/markov/msm/_markov_state_model.py#L293C9-L293C9

        if np.any(states >= self.n_states):
            raise ValueError("At least one of the given states is not contained in this model "
                             "(n_states={}, max. given state={}).".format(self.n_states, np.max(states)))

The problem is fallowing: assume that we have 10 states, we remove state 0 from the available states but state 9 remains. Even though the state is still available in our MSM mode, the function will raise ValueError (9 >= self.n_states; self.n_states = 8 now). In summary the function should check if it is available in current states (some list/np.array), not if it is greater/equal from the n_state counter.

Basically it should look something like this:

  if not np.in1d(states, self.state_symbols(self.current_model)).all():
            raise ValueError("At least one of the given states is not contained in this model "
                             "(n_states={}, max. given state={}).".format(self.n_states, np.max(states)))

Then apparently, the same modification have to be applied in: https://github.com/deeptime-ml/deeptime/blob/9a532859fdeabffbdfc713b4aad2c2da852c4917/deeptime/markov/_transition_counting.py#L351C1-L351C1

But when I tried to modify as in the previous example, it gives error:

deeptime/util/matrix.py", line 24, in submatrix
    C_cc = C_cc[sel, :]

Apparently states internally are from 0 ... N and the state numbers are not tracked.

clonker commented 1 year ago

Hi @DanielWicz

I took a look at this issue and implemented a test (PR #276) which should demonstrate and ensure that everything works as expected. Internally, the MSM has "states" which are always counted from zero. So if you have a MSM with 10 states {0, ..., 9} and submodel it by restricting to states {1,...,9}, then the submodel has states {0,...,8} but symbols {1,...,9} (in conjunction with a count model at least).

Does that clear it up?

Cheers, Moritz

clonker commented 1 year ago

closing this now, but feel free to reopen if something is unclear