markovModel(lag,n + 1) - Githubissues

noeliaferruz commented 8 years ago

Hello!

Question. It seems that sometimes lately when I try to build n macrostates I get n-1, is this an error?

Thanks, Noelia

model=Model(dataTica)
#model.plotTimescales(lags=range(1,100,5))
model.markovModel(50,5)
model.viewStates(protein="protein and name CA")

2016-06-07 12:58:55,952 - htmd.model - INFO - 24.4% of the data was used
2016-06-07 12:58:55,959 - htmd.model - INFO - Number of trajectories that visited each macrostate:
2016-06-07 12:58:55,960 - htmd.model - INFO - [5 4 5 2]
2016-06-07 12:58:55,962 - htmd.model - INFO - Take care! Macro 3 has been visited only in 2 trajectories:
2016-06-07 12:58:55,962 - htmd.model - INFO - id = 15
parent = None
input = []
trajectory = ['./filtered/e2s6_e1s8p0f473/output.filtered.xtc']
molfile = ./filtered/filtered.pdb
2016-06-07 12:58:55,963 - htmd.model - INFO - id = 16
parent = None
input = []
trajectory = ['./filtered/e2s7_e1s8p0f331/output.filtered.xtc']
molfile = ./filtered/filtered.pdb
[Parallel(n_jobs=1)]: Done   1 tasks       | elapsed:    6.8s
[Parallel(n_jobs=1)]: Done   2 tasks       | elapsed:   13.9s
[Parallel(n_jobs=1)]: Done   3 tasks       | elapsed:   20.5s
[Parallel(n_jobs=1)]: Done   4 tasks       | elapsed:   27.2s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   27.2s finished

stefdoerr commented 8 years ago

Getting n-1 is normal. Has to do with the implementation of PCCA.

The real problem here seems to me that your data is extremely disconnected and at a lag of 50 frames you are throwing away 75% of your data, having only essentially 16 trajectories of data left.

stefdoerr commented 8 years ago

p.s. there is a new option units= where you can provide units for your lag time. So instead of 50 you can do model.markovModel(5, 5, units='ns') so that if you later change your fstep you won't accidentally do the analysis on a different lag time.

noeliaferruz commented 8 years ago

Thanks,

I like the units= option!

It just happened in another system where I actually have a lot of data and 100.0% of the data was used, is it an indicative of a bad model although the implied timescales are good, or it's nothing at all to worry about?

stefdoerr commented 8 years ago

Well, it depends how you interpret it. I would not trust kinetics of the macrostate with has only occurred in 2 trajectories. Bootstrapping for example would give you huge errors.

The way I usually think about it is: Does this state look interesting?

yes: restart simulations from there to gather statistics
no: reduce the number of macrostates and it will be joined into another macrostate with more statistics

The more macrostates you add the more macrostates you will get with bad statistics.

Also look at your timescales. If you only have one slow process, the correct thing to do would be to make 2 macrostates for example.

Thinking about it, I would love to have a feature like

ace = AcemdLocal()
ace.submit(model, macro=2)

which starts simulations from macrostate 2 to improve sampling!

noeliaferruz commented 8 years ago

ok, thanks! Yes that function would be cool!

Acellera / htmd

markovModel(lag,n + 1) #55