Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
253 stars 58 forks source link

ligand binding analysis from the command line #67

Closed marionatf closed 8 years ago

marionatf commented 8 years ago

Hello, We are trying to follow the tutorial "ligand binding analysis" as a python script, but we can't use the methods that create plots (plotTrajSizes, plotTimescales, getRates): when we call them, an empty window appears, without any plot in it. Is there a way to obtain these plots without using python notebook? Thanks, Mariona

stefdoerr commented 8 years ago

Yes, remove the first command %pylab inline. That command tells notebooks to not open the plots and instead plot them "inline".

marionatf commented 8 years ago

It's already removed

stefdoerr commented 8 years ago

You might need to restart the python session if you called it before.

stefdoerr commented 8 years ago

How are you running the tutorial?

marionatf commented 8 years ago

I'm running it as the following python script:

from htmd import *

sets = glob('datasets/*/')
sims = []
for s in sets:
    fsims = simlist(glob(s + '/filtered/*/'), 'datasets/1/filtered/filtered.pdb')
    sims = simmerge(sims, fsims)

metr = Metric(sims)
metr.projection(MetricDistance('protein and name CA', 'resname MOL and noh', metric='contacts'))
data = metr.project()
data.fstep = 0.1

data.plotTrajSizes()
data.dropTraj()

tica = TICA(data, 20)
dataTica = tica.project(3)

dataBoot = dataTica.bootstrap(0.8)
dataBoot.cluster(MiniBatchKMeans(n_clusters=1000), mergesmall=5)

model = Model(dataBoot)
model.plotTimescales()
model.markovModel(50, 5)
model.viewStates(ligand='resname MOL and noh')

kin = Kinetics(model, temperature=298, concentration=0.0037)
r = kin.getRates()
print(r)
kin.plotRates()
kin.plotFluxPathways()
j3mdamas commented 8 years ago

You should be getting the plots. Does matplotlib work fine on your computer, @marionatf?

If this script works, so should plotRates() and plotFluxPathways

from matplotlib import pylab as plt
plt.figure()
plt.plot([1,2,3,4])
plt.show()
nesilin commented 8 years ago

We still have problems to visualize plots from a python script as @marionatf reported. We have tried with the following changes in the script:

from htmd import *
import time
from IPython import get_ipython

get_ipython().magic('pylab auto')
htmd.config(viewer='vmd')

sims = simlist(glob('input/*/'), glob('input/*/structure.pdb'), glob('input/*/'))
fsims = simfilter(sims, './filtered/', filtersel='not water')
sims = simmerge(sims, fsims)

# Calculating metrics
metr = Metric(sims)
metr.projection(MetricDistance('protein and name CA', 'resname MOL and noh', metric='contacts'))
data = metr.project()
data.fstep = 0.1

# Removing trajectories
data.plotTrajSizes() #not working

# TICA
tica = TICA(data, 20)
dataTica = tica.project(3)

# Bootstrapping
dataBoot = dataTica.bootstrap(0.8)

# Clustering conformaitons
dataBoot.cluster(MiniBatchKMeans(n_clusters=1000), mergesmall=5)

# Building the Markov model
model = Model(dataBoot)
model.markovModel(50, 5)
model.plotTimescales() #not working

# Visualizing the states
model.viewStates(ligand='resname MOL and noh') # This gives problems

# Calculating the kinetics
kin = Kinetics(model, temperature=298, concentration=0.0037)
r = kin.getRates()
print(r)
kin.plotRates() #not working
time.sleep(6000)

However, plots are still imposible to visualize. A part from pylab 'auto' we also have tried with' tk' and did not work either.

Then, we have tried to run the script as a jupyter notebook from the GPU machines provided. Plots only work from jupyter notebook if a macbook opens the notebook in the GPU machines . Non of the other PC can open the jupyter from there. But at least in this case we could visualize the plots. However, the plot looked awful. So it seems that we also have a problem regarding our data (the trajectories that we have generated). Here I post a screenshot of the plot of the timescales and the error reported just after it. 1- Data Issue: Any idea of what is happening with the timescale plot? 2- Visualization Issue: The error reported below the plot is the same we have obtained when we were executing the program as a python script. At the beginning of the execution track seems that matplotlib might have something to do with our visualization issues but we have tried the commands from @j3mdamas latest post and they are working fine. 13515340_10209982952875011_1837780752_n Error reported when executing as a python script:

Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['random']
`%matplotlib` prevents importing * from pylab and numpy
2016-06-22 13:52:20,286 - htmd.simlist - INFO - Starting listing of simulations.
Creating simlist: 100% (8/8) [###################################################################################################################################################] eta 00:01 -
2016-06-22 13:52:20,288 - htmd.simlist - INFO - Finished listing of simulations.
2016-06-22 13:52:21,097 - htmd.simlist - INFO - Starting filtering of simulations.
[Parallel(n_jobs=-2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-2)]: Batch computation too fast (0.0037s.) Setting batch_size=108.
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   9 out of   8 | elapsed:    0.0s remaining:   -0.0s
[Parallel(n_jobs=-2)]: Done   8 out of   8 | elapsed:    0.0s finished
2016-06-22 13:52:21,295 - htmd.simlist - INFO - Finished filtering of simulations
2016-06-22 13:52:21,295 - htmd.projections.metric - INFO - Metric: Starting projection of trajectories.
[Parallel(n_jobs=-2)]: Done   1 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-2)]: Done  16 out of  16 | elapsed:   26.5s finished
2016-06-22 13:52:47,933 - htmd.projections.metric - INFO - Finished projecting the trajectories.
2016-06-22 13:52:47,934 - htmd.projections.metric - INFO - Frame step 0.0ns was read from the trajectories. If it looks wrong, redefine it by manually setting the MetricData.fstep property.
getting output of TICA: 100% (16/16) [###########################################################################################################################################] eta 00:00 |/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1279: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, init_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1300: RuntimeWarning: init_size=300 should be larger than k=1000. Setting it to 3*k
  init_size=init_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:630: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, init_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:630: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, init_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:630: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, init_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
/home/student1/miniconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py:1328: DeprecationWarning: This function is deprecated. Please call randint(0, 5999 + 1) instead
  0, n_samples - 1, self.batch_size)
2016-06-22 13:52:58,292 - htmd.metricdata - INFO - Mergesmall removed 42 clusters. Original ncluster 49, new ncluster 7.
2016-06-22 13:52:58,851 - htmd.model - INFO - 100.0% of the data was used
2016-06-22 13:52:58,855 - htmd.model - INFO - Number of trajectories that visited each macrostate:
2016-06-22 13:52:58,855 - htmd.model - INFO - [ 4  5  7  5 12]
22-06-16 13:52:59 pyemma.msm.estimators.maximum_likelihood_msm.MaximumLikelihoodMSM[1] WARNING  Ignored error during estimation: Active set is empty. Cannot estimate MSM.
2016-06-22 13:52:59,078 - pyemma.msm.estimators.maximum_likelihood_msm.MaximumLikelihoodMSM[1] - WARNING - Ignored error during estimation: Active set is empty. Cannot estimate MSM.
22-06-16 13:52:59 pyemma.msm.estimators.maximum_likelihood_msm.MaximumLikelihoodMSM[2] WARNING  Ignored error during estimation: Active set is empty. Cannot estimate MSM.
2016-06-22 13:52:59,096 - pyemma.msm.estimators.maximum_likelihood_msm.MaximumLikelihoodMSM[2] - WARNING - Ignored error during estimation: Active set is empty. Cannot estimate MSM.
estimating MaximumLikelihoodMSM: 100% (26/26) [##################################################################################################################################] eta --:-- /22-06-16 13:52:59 pyemma.msm.estimators.implied_timescales.ImpliedTimescales[1] WARNING  Estimation has failed at lagtimes: [458 499]. Run single-lag estimation at these lags to track down the error.
2016-06-22 13:52:59,204 - pyemma.msm.estimators.implied_timescales.ImpliedTimescales[1] - WARNING - Estimation has failed at lagtimes: [458 499]. Run single-lag estimation at these lags to track down the error.
22-06-16 13:52:59 pyemma.msm.estimators.implied_timescales.ImpliedTimescales[1] WARNING  Changed user setting nits to the number of available timescales nits=6
2016-06-22 13:52:59,239 - pyemma.msm.estimators.implied_timescales.ImpliedTimescales[1] - WARNING - Changed user setting nits to the number of available timescales nits=6
22-06-16 13:52:59 pyemma.msm.estimators.implied_timescales.ImpliedTimescales[1] WARNING  Some timescales could not be computed. Timescales array is smaller than expected or contains NaNs
2016-06-22 13:52:59,239 - pyemma.msm.estimators.implied_timescales.ImpliedTimescales[1] - WARNING - Some timescales could not be computed. Timescales array is smaller than expected or contains NaNs
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/home/student1/isentis/analysis2.py in <module>()
     35 
     36 # Visualizing the states
---> 37 model.viewStates(ligand='resname MOL and noh') # This gives problems
     38 
     39 # Calculating the kinetics

/home/student1/miniconda3/lib/python3.5/site-packages/htmd/model.py in viewStates(self, states, statetype, protein, ligand, viewer, mols, numsamples, wrapsel, alignsel)
    402             states = [states]
    403         if mols is None:
--> 404             mols = self.getStates(states, statetype, numsamples=numsamples, wrapsel=wrapsel, alignsel=alignsel)
    405         colors = [0, 1, 3, 4, 5, 6, 7, 9]
    406         for i, s in enumerate(states):

/home/student1/miniconda3/lib/python3.5/site-packages/htmd/model.py in getStates(self, states, statetype, wrapsel, alignsel, alignmol, samplemode, numsamples)
    329         refmol = None
    330         if not single:
--> 331             raise NameError('Visualizer does not support yet visualization of systems with different number of atoms')
    332         if alignmol is None:
    333             alignmol = molfile

NameError: Visualizer does not support yet visualization of systems with different number of atoms

(sorry for the long post)

stefdoerr commented 8 years ago

So beyond the plotting issue which I have no clue, you have two problems with the timescales and the visualizer error.

  1. Fixing the visualizer error. Your systems have different number of atoms, therefore the visualizer cannot load them into VMD. What you should do is filter the waters out of the systems with simfilter . That should give you fixed number of atoms for all of your systems and fix the error.
  2. Horrible timescales. Well, you have 8 trajectories :smile: Markov models need looooooots of data to work well for complex ligand binding systems (hundreds or thousands of trajectories of multiple nanoseconds). Don't expect to get good timescales with the resources you were given, otherwise we would have cured cancer by now :stuck_out_tongue: . Practically I doubt you can even build a Markov model with that data. As the PyEMMA logger says, there were no states left in the active set, meaning that your states at the given lag-times were disconnected.
giadefa commented 8 years ago

actually, for trypsin you should be able to produce decent results in two days of runs

On 22 June 2016 at 14:52, Stefan notifications@github.com wrote:

So beyond the plotting issue which I have no clue, you have two problems with the timescales and the visualizer error.

1.

Fixing the visualizer error. Your systems have different number of atoms, therefore the visualizer cannot load them into VMD. What you should do is filter the waters out of the systems with simfilter https://www.htmd.org/docs/htmd.simlist.html#htmd.simlist.simfilter . That should fix it. 2.

Horrible timescales. Well, you have 8 trajectories 😄 Markov models need looooooots of data to work well for complex ligand binding systems (hundreds or thousands of trajectories of multiple nanoseconds). Don't expect to get good timescales with the resources you were given, otherwise we would have cured cancer by now 😛 . Practically I doubt you can even build a Markov model with that data. As the PyEMMA logger says, there were no states left in the active set, meaning that your states at the given lag-times were disconnected.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/67#issuecomment-227733268, or mute the thread https://github.com/notifications/unsubscribe/AHaqOs6qTf9ne_7JvgqjnNYHkzDoqnOwks5qOTAIgaJpZM4I4beX .

http://www.acellera.com

   <https://twitter.com/acellera>

https://www.youtube.com/user/acelleracom https://www.linkedin.com/company/2133167?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A2133167%2Cidx%3A2-1-2%2CtarId%3A1448018583204%2Ctas%3Aacellera https://www.acellera.com/md-simulation-blog-news/ http://is.gd/1eXkbS

alejandrovr commented 8 years ago

We are using:

md = AdaptiveRun()
md.nmin=6
md.nmax=8
md.nepochs = 12

with 5 different systems initialized with docking (5 folders of generators). Is this ok? Is there any better/faster way? Should we increase the number of epochs?

stefdoerr commented 8 years ago

These settings depend on how many GPUs you have dedicated to you. 5 generators is okay I guess. I doubt you have reached 12 epochs yet.

alejandrovr commented 8 years ago

We have 4 GPUs. And, by the way, to stop the run...should we do Control+C ? Because we have understood that it will run forever until it is stopped. Should we killed it when the 12 epoch folders are present?

Thank you all for your answers, we know some of them are really basic.

stefdoerr commented 8 years ago

At 12 epochs it should stop by itself. To kill it you can Ctrl+C but this will only kill the adaptive, not the currently running simulations which you might have to kill from the process manager or alternatively by killing the python session (if you are using AcemdLocal).

marionatf commented 8 years ago

About the visualization error saying we have different number of atoms, we think we are already filtering the waters out, but still we get the error. Maybe we are doing it wrong, we are using:

sims = simlist(glob('input/*/'), glob('input/*/structure.pdb'), glob('input/*/'))
fsims = simfilter(sims, './filtered/', filtersel='not water')
sims = simmerge(sims, fsims)
stefdoerr commented 8 years ago

Yes, you should not merge the filtered with the unfiltered simulations. That beats the point of using the filtered simulations. So just use the fsims.

stefdoerr commented 8 years ago

Actually the plotting issue was caused by some code in HTMD. For the moment because we won't make a new release soon find please go to the folder of your conda installation and find the file lib/python3.5/site-packages/htmd/parameterize/__init__.py And remove the lines

import matplotlib as mpl
mpl.use('Agg')