metadynamics with MSMs - Githubissues

liyigerry commented 6 years ago

@MoleOrbitalHybridAnalyst

你好，请问怎么使用pyemma处理metadynamics轨迹？参考了你的tram2d.py，但是没怎么看懂，能给点说明？

filenames[thermo_index]是什么meta文件？不太理解tram中的ttraj和dtraj，怎么从meta文件中提取这两个序列？

我要分析的轨迹在这里meta_pyemma_example

谢谢

MoleOrbitalHybridAnalyst commented 6 years ago

tram2d.py 是用来处理umbrella sampling数据的，不能直接用来做metadynamics. meta文件包含各个umbrella window的cv time series文件的index. 每个umbrella window对应一个Hamiltonian因此被称为一个thermostate，ttraj为按时间顺序排列的各个frame的thermostate，即thermostate的time series. 如果要做metadynamics，首先要明白每个gaussian加入都意味着hamiltonian变了，也就是一个新的thermostate，一个自然的想法就是把已经加的gaussian数当作thermostate的index. 弄好ttraj后，需要算出每个frame在i-th thermostate中的bias energy，你的CVS的meta.bias列即为该frame在当前thermostate下的bias energy，你需要算出其他thermostate下的bias energy. dtraj是最简单的，是把cv space离散化后的Markov states的index. 你需要的信息很简单，settings in METAD(PACE=?, SIGMA=?, HEIGHT=?)再加上CVS就可以了.

liyigerry commented 6 years ago

谢谢回复，帮助很大。

按照你说的，尝试做了下，仍有问题，没有怎么理解tram，能帮忙看一下？在这里tram.ipynb

在模拟过程中，快照是5000积分步输出一次，高斯bias是500步添加一次，也就说每个快照都可以视作一个thermostate。所以，直接生成ttraj。

用二面角、相对于两个结构的RMSD对轨迹做了MSMs，得到了dtraj。

然后，我将bias的PACE设为1，用plumed driver了一下轨迹，生成了新的HILLS_reweight文件，这样就得到了每个快照对应的height，组合了ttraj后，得到bias。

MoleOrbitalHybridAnalyst commented 6 years ago

是你的bias的问题，For every simulation frame seen in trajectory i and time step t, btrajs[i][t, k] is the reduced bias energy of that frame evaluated in the k’th thermodynamic state.. TRAM需要知道每个frame在每个thermostate中的bias energy来计算reweighting factor，如果你想通过driver来算的话，需要运行K次(where K = # of thermostates)计算每个frame的bias. 理解TRAM可以参考这篇文章https://doi.org/10.1073/pnas.1525092113 也可以先理解dTRAM，https://doi.org/10.1063/1.4902240

liyigerry commented 6 years ago

好的，谢谢，我仔细看看，不懂再问你。

uitb commented 6 years ago

tram2d.py 是用来处理umbrella sampling数据的，不能直接用来做metadynamics. meta文件包含各个umbrella window的cv time series文件的index. 每个umbrella window对应一个Hamiltonian因此被称为一个thermostate，ttraj为按时间顺序排列的各个frame的thermostate，即thermostate的time series. 如果要做metadynamics，首先要明白每个gaussian加入都意味着hamiltonian变了，也就是一个新的thermostate，一个自然的想法就是把已经加的gaussian数当作thermostate的index. 弄好ttraj后，需要算出每个frame在i-th thermostate中的bias energy，你的CVS的meta.bias列即为该frame在当前thermostate下的bias energy，你需要算出其他thermostate下的bias energy. dtraj是最简单的，是把cv space离散化后的Markov states的index. 你需要的信息很简单，settings in METAD(PACE=?, SIGMA=?, HEIGHT=?)再加上CVS就可以了.

@MoleOrbitalHybridAnalyst 你好，请问这个“其他thermostate下的bias energy”该怎样计算？

我理解的是，在@liyigerry 的这个代码里面，ttrajs是一个长度为T的一维矩阵，liyigerry前面也提到，“在模拟过程中，快照是5000积分步输出一次，高斯bias是500步添加一次，也就说每个快照都可以视作一个thermostate。所以，直接生成ttraj”，也就是他这个T就等于K，bias就应该是一个T*T的矩阵。

你有说到，“你的CVS的meta.bias列即为该frame在当前thermostate下的bias energy”，没弄明白剩下的T-1个bias energy该怎样计算，想请教一下！

谢谢！

MoleOrbitalHybridAnalyst commented 6 years ago

Thermo-state是Hamiltonian的一一映射， @liyigerry 的例子中每个snapshot的确是处在不同的thermo-state中。Plumed的输出文件只计算了每个snapshot在当前的thermo-state下的bias energy, 没有处在其他thermo-state下的bias energy。在metad的语境下，就是每个frame在（没加gaussian，加了一个gaussian，加了2个gaussian，...）的时候的bias energy都是需要的. 这个frame可能是出现在模拟中加了k个gaussian时，这时的bias energy是已经由plumed输出了的，再算出其他的即可。Pratically, 可以用sum_hills来生成K个grid file，再跑K次driver，每次读不同的grid file.

uitb commented 6 years ago

@MoleOrbitalHybridAnalyst 非常感谢你能这么快的回复，解释的非常详细，我也基本上搞懂了。不过最后一点，driver这，还想再请教一下怎么读生成的grid file文件？仔细看了plumed官网的driver相关介绍、也google搜索了很久，还是没找到相关的资料。

再次感谢！！！

MoleOrbitalHybridAnalyst commented 6 years ago

用METAD的GRID_RFILE，再把PACE设很长即可

uitb commented 6 years ago

谢谢，昨天我也这样做过，这样的话，plumed会提示没有meta.bias这一列参数？

+++ message: assertion failed ifile.FieldExist( funcl ), no column labelled meta.bias in in grid input

我的plumed.dat METAD是这样写的

METAD ... ARG=cm2lig,cm2unlig PACE=9999999 HEIGHT=20 SIGMA=1,1 FILE=HILLS_driver BIASFACTOR=10 TEMP=300.0 LABEL=meta GRID_MIN=0,0 GRID_MAX=141,50 GRID_RFILE=fes_0.dat ... METAD

生成grid file 用的命令是：plumed sum_hills --hills HILLS --stride 10

grid file 是这样的

! FIELDS cm2lig cm2unlig file.free der_cm2lig der_cm2unlig

! SET min_cm2lig -3.063

! SET max_cm2lig 46.2591

! SET nbins_cm2lig 141

! SET periodic_cm2lig false

! SET min_cm2unlig 4.98979

! SET max_cm2unlig 22.1382

! SET nbins_cm2unlig 50

! SET periodic_cm2unlig false

-3.063000000 4.989790000 -0.000000000 -0.000000000 -0.000000000 -2.710699286 4.989790000 -0.000000000 -0.000000000 -0.000000000 -2.358398571 4.989790000 -0.000000000 -0.000000000 -0.000000000 -2.006097857 4.989790000 -0.000000000 -0.000000000 -0.000000000 -1.653797143 4.989790000 -0.000000000 -0.000000000 -0.000000000 -1.301496429 4.989790000 -0.000000000 -0.000000000 -0.000000000 -0.949195714 4.989790000 -0.000000000 -0.000000000 -0.000000000 -0.596895000 4.989790000 -0.000000000 -0.000000000 -0.000000000 -0.244594286 4.989790000 -0.000000000 -0.000000000 -0.000000000 0.107706429 4.989790000 -0.000000000 -0.000000000 -0.000000000 ......

你看看这些有什么不对吗？

MoleOrbitalHybridAnalyst commented 6 years ago

我没试过用METAD读sum_hills生成的grid file. 你可以试试用METAD读HILLS，现在plumed的behavior应该是从读的HILLS文件自己生成grid，所以应该可行。

uitb commented 6 years ago

好的，非常感谢，我再试试去。

uitb commented 6 years ago

@MoleOrbitalHybridAnalyst 你能否帮我看下下边这个脚本有什么问题，弄了好久，还是有点没太弄明白。谢谢！

# coding: utf-8
import pandas as pd
import numpy as np
import pyemma.coordinates as coor
import pyemma.msm as msm
import pyemma.plots as mplt
from pyemma.thermo import tram

############  dtraj  ###########

top_un_file = 'm_3681_4nco.pdb'
top_li_file = 'm_3681_3j70.pdb'
traj_list = ['pro.xtc']

feat = coor.featurizer(top_un_file)
feat.add_minrmsd_to_ref(top_un_file, ref_frame=0)
feat.add_minrmsd_to_ref(top_li_file, ref_frame=0)
feat.add_backbone_torsions(cossin=True)
print feat.dimension()

inp = coor.source(traj_list, feat)
#print('number of trajectories = ',inp.number_of_trajectories())
#print('trajectory length = ',inp.trajectory_length(0))
#print('trajectory time step = ',500.0 / (inp.trajectory_length(0)-1),'ns')
#print('number of dimension = ',inp.dimension())

n_clusters = 10
clustering = coor.cluster_kmeans(inp.get_output(),k=n_clusters)

# dtrajs: index of Markov states
dtraj = clustering.dtrajs

############  dtraj  ###########

its = msm.timescales_msm(dtraj)
mplt.plot_implied_timescales(its, ylog=False, units='steps', linewidth=2)

############  bias   ###########
# K次driver生成了一系列的colvar文件，读取meta.bias列
colvar_files = ['driver_bias_03_02/COLVAR_driver_bias_{}'.format(i) for i in range(0,2392)]
bias = []
for f in colvar_files:
    colvar = pd.read_csv(f,sep=r'\s*', header=None, comment='#')
    colvar.columns = ['time','cm2unlig','cm2lig','meta.bias','meta.work']
    bias.append(colvar['meta.bias'][:])

############  bias   ###########

#下面这是参考你的[tram2d.py](https://github.com/MoleOrbitalHybridAnalyst/mdtools/blob/master/tram2d.py)以及pyemma给出的这个[example](http://www.emma-project.org/v2.2.5/api/generated/thermo-api/pyemma.thermo.tram.html)整理出的ttrajs、dtrajs、bias

ttrajs = [np.array([i for j in range(len(bias))]) for i in range(len(colvar_files))]
dtrajs = [np.array(dtraj[0]) for i in range(len(colvar_files))]
btrajs = [np.array(bias).T]*len(colvar_files)

tram_obj = tram(ttrajs, dtrajs, btrajs, 1,maxiter=10000)

MoleOrbitalHybridAnalyst commented 6 years ago

Your ttrajs and dtrajs are not consistent with your btrajs. Assume there are T frames in total in your metad trajectory，then ttrajs is supposed to be a T-dimensional array, dtrajs should also be T-dimensional and btrajs is T\times K dimensional. Your btrajs looks fine but the ttrajs should be np.array([[i for j in range(T/K)] for i in range(K)]).flatten(). The dtraj can be directly used as your dtrajs, I guess. I never use kmeans in pyemma. Don’t get confused by my tram2d.py. It is for umbrella sampling and thus I have multiple trajectories instead of 1. That’s the reason I have a list of thermo-trajectories for ttrajs instead of just one single array.

On Mar 7, 2018, at 8:27 PM, uitb notifications@github.com wrote:

@MoleOrbitalHybridAnalyst https://github.com/moleorbitalhybridanalyst 你能否帮我看下下边这个脚本有什么问题，弄了好久，还是有点没太弄明白。谢谢！

`# coding: utf-8 import pandas as pd import numpy as np import pyemma.coordinates as coor import pyemma.msm as msm import pyemma.plots as mplt from pyemma.thermo import tram

############ dtraj ###########

top_un_file = 'm_3681_4nco.pdb' top_li_file = 'm_3681_3j70.pdb' traj_list = ['pro.xtc']

feat = coor.featurizer(top_un_file) feat.add_minrmsd_to_ref(top_un_file, ref_frame=0) feat.add_minrmsd_to_ref(top_li_file, ref_frame=0) feat.add_backbone_torsions(cossin=True) print feat.dimension()

inp = coor.source(traj_list, feat)

print('number of trajectories = ',inp.number_of_trajectories())

print('trajectory length = ',inp.trajectory_length(0))

print('trajectory time step = ',500.0 / (inp.trajectory_length(0)-1),'ns')

print('number of dimension = ',inp.dimension())

n_clusters = 10 clustering = coor.cluster_kmeans(inp.get_output(),k=n_clusters)

dtrajs: index of Markov states

dtraj = clustering.dtrajs

############ dtraj ###########

its = msm.timescales_msm(dtraj) mplt.plot_implied_timescales(its, ylog=False, units='steps', linewidth=2)

############ bias ###########

K次driver生成了一系列的colvar文件，读取meta.bias列

colvar_files = ['driver_bias_03_02/COLVAR_driverbias{}'.format(i) for i in range(0,2392)] bias = [] for f in colvar_files: colvar = pd.read_csv(f,sep=r'\s*', header=None, comment='#') colvar.columns = ['time','cm2unlig','cm2lig','meta.bias','meta.work'] bias.append(colvar['meta.bias'][:])

############ bias ###########

下面这是参考你的tram2d.py https://github.com/MoleOrbitalHybridAnalyst/mdtools/blob/master/tram2d.py以及pyemma给出的这个example http://www.emma-project.org/v2.2.5/api/generated/thermo-api/pyemma.thermo.tram.html整理出的ttrajs、dtrajs、bias

ttrajs = [np.array([i for j in range(len(bias))]) for i in range(len(colvar_files))] dtrajs = [np.array(dtraj[0]) for i in range(len(colvar_files))] btrajs = [np.array(bias).T]*len(colvar_files)

tram_obj = tram(ttrajs, dtrajs, btrajs, 1,maxiter=10000)

`

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MoleOrbitalHybridAnalyst/mdtools/issues/1#issuecomment-371356975, or mute the thread https://github.com/notifications/unsubscribe-auth/AY-2JVr4o76xkHTIyIEMCx_081lehRc2ks5tcJb7gaJpZM4SBwYL.

liyigerry commented 6 years ago

@MoleOrbitalHybridAnalyst

Thanks for your help.

In my case, there is only one trajectory whose each frame belongs to a separated thermostate. So, ttrajs should be the range from 0 to the longth of trajectory and dtrajs can be obtained from kmean cluster based on the characterization of trajectory as your suggestion.

How to understand the btrajs?

In this docs of pyemma, the btrajs[i][t, k] is the reduced bias energy in trajectory i and time step t for the k’th thermostate.

How can I understand the bias at step t, or frame t in my case, for the thermostate of k? For example, step 0/ frame 0 for 0 thermostate, and step 1/ frame 1 for 0 thermostate.

Waiting for your help.

Thanks.

MoleOrbitalHybridAnalyst / mdtools

metadynamics with MSMs #1

! FIELDS cm2lig cm2unlig file.free der_cm2lig der_cm2unlig

! SET min_cm2lig -3.063

! SET max_cm2lig 46.2591

! SET nbins_cm2lig 141

! SET periodic_cm2lig false

! SET min_cm2unlig 4.98979

! SET max_cm2unlig 22.1382

! SET nbins_cm2unlig 50

! SET periodic_cm2unlig false

print('number of trajectories = ',inp.number_of_trajectories())

print('trajectory length = ',inp.trajectory_length(0))

print('trajectory time step = ',500.0 / (inp.trajectory_length(0)-1),'ns')

print('number of dimension = ',inp.dimension())

下面这是参考你的tram2d.py https://github.com/MoleOrbitalHybridAnalyst/mdtools/blob/master/tram2d.py以及pyemma给出的这个example http://www.emma-project.org/v2.2.5/api/generated/thermo-api/pyemma.thermo.tram.html整理出的ttrajs、dtrajs、bias