Closed yongwangCPH closed 7 years ago
Both papers that came out introducing TICA said it doesn't affect the results much. From our experience and apparently yours it does seem to affect them. I have not seen any comment on it from Frank's group or Pande's. Maybe Guillermo @gph82 has some new ideas on it? This is more a discussion than an issue so I'll close it but we can keep talking here.
Actually now I remember Guillermo showing some heuristic for picking the lagtime based on autocorrelations. But the recent publications I saw focused more on finding the best number of dimensions than the best lag-time.
This remains a question hard to answer in one line. There are definitively wrong lagtimes (too short or too long), but everything in between is essentially equally valid, as long as you arrive at an MSM/HMM that's robust, reproducible and helpful. @yongwangCPH , can you clarify what "very dependent" means?
Thanks a lot for your comments. @gph82 The MFPT could be different in one or two orders of magnitude when I used different TICA lag times (e.g. 2ns, 50ns). Anyway, it is good to know it is still a kind of open question.
I am sorry but that sentence alone is not really helpful. You could be in a sort of "oranges vs. apples" situation. Varying the tica lagtime by one order of magnitude certainly might lead to different subspaces, particularly if you're keeping few TICs. How many TICs are you keeping at lagtime? What does the spectrum look like?
On 23.01.2017 22:52, yongwangCPH wrote:
Thanks a lot for your comments. @gph82 https://github.com/gph82 The MFPT could be different in one or two orders of magnitude when I used different TICA lag times (e.g. 2ns, 50ns). Anyway, it is good to know it is still a kind of open question.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/230#issuecomment-274629102, or mute the thread https://github.com/notifications/unsubscribe-auth/AHK3NN_s3NKlfuS_9WpZC50BbVYkPBXOks5rVSEQgaJpZM4LrOCN.
@gph82 Here is an example from Adria of how the lagtime affects the timescales. Tested with 2, 3 and 5ns lagtimes. http://nbviewer.jupyter.org/gist/stefdoerr/aa937f2f8e513d78d3482db841fa76ba
Admittedly the simulations are 14ns so he could have chosen a larger lagtime at the beginning. Still, as we can see, finding the best lagtime is a bit of a trial and error procedure.
Hi, it's a bit confusing with all the warnings and the output, but I guess you want me to compare the its plots for TICA done at 20, 30, 50., right?
What I don't get is...why do tica at 20, 30, 50 if you're computing MSMs between 0 and 15? Are this the same units?
No, sorry for the units. We have improved this in new versions. The timescale plots are in ns units and the TICA arguments are in simulation steps. So 30 tica lag = 3ns lag
So I would probably build the model at lag 4ns = 40 frames
ok so we have tica at 2, 3, 5 ns and its between 1 and 15 ns?
yes
can you produce a tica-its plot? I suspect there's a process missing between 30 and 40. I'd say 40 and 50 are similar. Plus, very big point...who says 3 projections are enough? How does the TICA-spectrum look?
TICA-spectrum as in the kinetic variance? I will check that. How do you do TICA its in pyemma?
Every tica object has an .timescales atttribute. Stack them and plot them
Ok, I'll report back when I have those :)
I only did the first one but is this what you meant? I plotted the cumvar.
However for timescales they are only computed at the given lagtime. I would need to compute TICA for multiple lag-times to make a TICA its plot right? Any method for that? I could not find a timescales_tica
function in PyEMMA. I can do it manually I guess. But TICA is quite slow so it would take lots of time in any case.
http://nbviewer.jupyter.org/gist/stefdoerr/17b5cafde1e20b808225b3fbe4018152
By the way this is what I pretty much always see with the cumvar. That I need hundreds to thousands of dimensions to reach 95% variance.
Well you had already 3 tica objects so you could do it. Plus, can you rescale the plots in cells 12, 20 to be xlim([0,50])
Nah I had restarted the kernel. Ok will do
there are some issues with the kinetic variance criterion always being useful, particularly in high-dimensional spaces. This discussion essentially boils down to: Is there a timescale gap? And there isn't...we can fiddle around all day, but this is a hard problem...could I get an .npy with the orginal features and make my point?
Why no timescale gap? I see a very big gap between the first eigenvalue and the second.
http://pub.htmd.org/FAAH_features.zip Here you go. It's 500MB zipped
Sorry that the gap was not totally visible. Should have fixed the xlim to [-1, 50] instead of [0, 50]. But you can see it in the plots if you look close.
thanks for the file. I'll take a look and see if I can make my point and perhaps help. It's been day #2 of the workshop and I'm out-pythoned
Hahah sure, take your time. Good luck with the workshop.
Guillermo was on fire today with his new projX module :p
On Feb 21, 2017 17:23, "Stefan Doerr" notifications@github.com wrote:
Hahah sure, take your time. Good luck with the workshop.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/230#issuecomment-281394754, or mute the thread https://github.com/notifications/unsubscribe-auth/AKTzQlbdklcYBiXmDhFqr2pXZ6XELmZPks5rew-YgaJpZM4LrOCN .
Hi, I came across this thread and was wondering if you have any advice on choosing the lag time for TICA. What exactly is meant by timescale gap? Is there any criteria I can use to decide whether my TICA lag time is acceptable?
On the question of timescale gap: If you calculate the eigenvalues of the transition probability matrix built at one lag time you usually see a separation between some large eigenvalues and some small ones. The large values correspond to the slow processes and the small ones to fast processes. Each eigenvalue gives you the timescale of the corresponding process. This gap between the eigenvalues or timescales is what is called the timescale gap. If you do the typical ITS (implied timescale) plot where you calculate the matrix at multiple lag times and plot the eigenvalues as lines you can see that there is a separation of 2-3 slow processes in this plot and the rest which are very fast.
As for choosing the TICA lagtime I don't have much of a tip to give you other than:
Hi! I want to perform dimensionality reduction on an MD simulation using TICA. My goal is to generate a free energy surface using the first three TICA components, similar to how PCA is often used. Since I'm not building an MSM, could you suggest a good starting point for the lag time value and explain the factors I should consider when choosing it?
http://docs.markovmodel.org/lecture_tica.html Especially the references at the end of the tutorial
The conclusion at least in the original paper was that the selected lag time did not affect strongly the results. We usually set it to something like 2ns
I found the final MSM is very dependent on the choice of the TICA lag time. In the tutorials, I noticed that you used 2ns for both binding and folding cases. Any reason to chose this value? Any comments are appreciated.