Running adaptive sampling using AMBER on a cluster using SLURM or PBS

eric-jm-lang commented 7 years ago

Hello, I am very interested in using htmd to run some adaptive sampling simulations. However, the examples I have seen on adaptive sampling seem to deal only with ACEMD on a local GPU cluster. I would like to know if it is possible to run adaptive sampling using Amber on GPUs (i.e. pmemd.cuda) on a cluster that either relies on PBS or SLURM to manage the queue. If yes could you please let me know what I should specify in my scripts to be able to run such kind of adaptive sampling. Many thanks in advance Eric

giadefa commented 7 years ago

it should work in the sense that some work was done there, but it is largely untested by us at least.

On 13 February 2017 at 18:20, eric-jm-lang notifications@github.com wrote:

Hello, I am very interested in using htmd to run some adaptive sampling simulations. However, the examples I have seen on adaptive sampling seem to deal only with ACEMD on a local GPU cluster. I would like to know if it is possible to run adaptive sampling using Amber on GPUs (i.e. pmemd.cuda) on a cluster that either relies on PBS or SLURM to manage the queue. If yes could you please let me know what I should specify in my scripts to be able to run such kind of adaptive sampling. Many thanks in advance Eric

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/255, or mute the thread https://github.com/notifications/unsubscribe-auth/AHaqOqfXCqO5YMGIR9J46x6kKcy2hNBIks5rcJDdgaJpZM4L_eyy .

-- http://www.acellera.com

   <https://twitter.com/acellera>

https://www.youtube.com/user/acelleracom https://www.linkedin.com/company/2133167?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A2133167%2Cidx%3A2-1-2%2CtarId%3A1448018583204%2Ctas%3Aacellera https://www.acellera.com/md-simulation-blog-news/ http://is.gd/1eXkbS

j3mdamas commented 7 years ago

Hi Eric,

About Amber (pmemd.cuda): like Gianni said, work was done on that, but it's untested by us and we cannot support it.

About the queue/resources: we have been refactoring it recently, so if you see any problems in the documentation, let us know. in short, you can use SLURM, but not PBS yet, has it is untested.

md = SlurmQueue()
md.queue = <the name of the queue>
md.submit('./outdir_of_protocol')

or for adaptive:

md = AdaptiveMD()
md.app = SlurmQueue()

stefdoerr commented 7 years ago

Hey Eric. The queues will work fine (i.e. SLURM/PBS). The only practical issue is that you will need to write a run.sh script inside your simulation folders which takes an input.coor or some other coordinate format that AMBER supports and will run the simulation using that. It should be trivial if you have some knowledge of AMBER and we can help you if you can show us how a simulation is typically run from command line.

eric-jm-lang commented 7 years ago

Hello everyone,

Thanks a lot for your very quick replies. So I understand this is untested by you and that you cannot support it but I would nonetheless be very interested in testing/using it.

I have a good knowledge of AMBER but I am an absolute beginner with the htmd package. Here is an example of input file and slurm script I am using to run the MD simulations I am interested in running adaptive sampling on: AMBER input file (md1.i):

&cntrl
 imin=0,                       ! Not a minimisation run
 irest=1,                      ! Restart simulation
 ntx=5,                          ! Read coordinates and velocities from coordinate file
 nscm=1000,                    ! Reset COM every 1000 steps
 nstlim=800000000, dt=0.004,     ! Run MD for 3.2 us with a timestep of 4 fs
 ntpr=2500, ntwx=25000,          ! Write the trajectory every 100 ps and the energies every 10 ps
 ioutfm=1,                     ! Use Binary NetCDF trajectory format (better)
 iwrap=0,                      ! No wrapping will be performed
 ntxo=2,                         ! NetCDF rst file
 ntwr=25000000,                ! Write a restrt file every 100 ns steps, if negative value, the files are not overwritten
 cut=1000.0,               ! Cut-off electrostatics
 ntb=0,                    ! no periodicity
 ntc=2, ntf=2,                 ! SHAKE on all H
 ntp=0,                          ! No pressure regulation
 ntt=3,                        ! Temperature regulation using langevin dynamics
 tempi=278.15,
 temp0=278.15,
 igb=8,                    ! Implicit solvent
 saltcon=0.137,         ! Ionic concentration
 gamma_ln=1.0,                 ! Langevin thermostat collision frequency
 ig=-1,                          ! Randomize the seed for the pseudo-random number generator
 ntr=0,                          ! Flag for restraining specified atoms
 nmropt=0,                     ! NMR restraints and weight changes read
/

Slurm script:

#!/bin/bash -login
#
#SBATCH -p gpu
#SBATCH -J name
#SBATCH --time=24:00:00     # Walltime
#SBATCH -A XXX        # Project Account
#SBATCH --ntasks-per-node=1 # number of tasks per node
#SBATCH --gres=gpu:1
#
module add apps/amber-16
#
pmemd.cuda -O -i md1.i -o prot1_md1.mdout -p prot1.parm7 -c prot1_eq.rst7   -ref prot1.rst7 -x prot1_md1.nc -r prot1_md1.rst7 -inf prot1_md1.mdinfo

Stefan, is the run.sh file you mentioned equivalent to the above slurm script? Could you please let me know how to implement those two scripts within the htmd framework in order to run my simulations?

Is the python script that starts the adaptive sampling supposed to run on the front node has a hidden process? or is it supposed to be a job submitted to the queue to run on a single core?

Many thanks in advance for your help!

Eric

stefdoerr commented 7 years ago

Nice :)

So a simulation folder for me looks like this:

[sdoerr@loro Tue11:43 equil_eukar6] tree 3jyc/
3jyc/
├── input
├── job.sh
├── parameters
├── run.sh
├── structure.pdb
└── structure.psf

The job.sh slurm bash file is written automatically by HTMD but instead of calling your last two commands it just calls run.sh. They look like this and are automatically generated by the SlurmQueue class.

#!/bin/bash
#
#SBATCH --job-name=MYNAME
#SBATCH --partition=multiscale,multiscaleCPU,playmolecule
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1000
#SBATCH --priority=gpu_priority
#SBATCH --workdir=/pathtosim/
#SBATCH --output=slurm.%N.%j.out
#SBATCH --error=slurm.%N.%j.err
#SBATCH --export=ACEMD_HOME,HTMD_LICENSE_FILE

trap "touch /pathtosim/htmd.queues.done" EXIT SIGTERM

cd /pathtosim/
/pathtosim/run.sh

In our case since we run ACEMD our run.sh file looks like this:

#!/bin/bash
acemd >log.txt 2>&1

In your case it would probably look like this

#!/bin/bash
module add apps/amber-16
pmemd.cuda -O -i md1.i -o prot1_md1.mdout -p prot1.parm7 -c prot1_eq.rst7   -ref prot1.rst7 -x input.nc -r prot1_md1.rst7 -inf prot1_md1.mdinfo

So if I remember correctly the .nc file are the coordinates, right? Then you need to tell the adaptive to write an nc file with the new coordinates for each simulation. So to understand how the adaptive starts new simulations. It picks a frame from an old simulation, copies the whole simulation folder and replaces the input.nc file. This creates a "new" simulation. So you need to make sure that everything would work fine if you replaced the coordinate file in your simulation folders.

ad = AdaptiveMD()
[...]
ad.coorname = 'input.nc'
[...]

The adaptive will run on your local machine (or whichever machine you want which has access to sbatch, squeue etc). It will then distribute simulations to SLURM

stefdoerr commented 7 years ago

Oh sorry, no nc are trajectory files right? What are the input coordinate file extensions for AMBER?

j3mdamas commented 7 years ago

@stefdoerr, isn't adaptive going to fail without using the SlurmQueue() in AdaptiveMD.app? Even though job.sh is copied, AdaptiveMD would not know how to submit, no? And if it does, it will overwrite the previous job.sh.

eric-jm-lang commented 7 years ago

Hi Stefan,

Thanks a lot! Yes indeed .nc are trajectory files .rst7 are the coordinate files. so in this example prot1_md1.rst7 is the file with the coordinates to restart the simulations for a second md run for example. However the rst7 file only contains the coordinates from the last step of the trajectory. Is that sufficient? I thought the trajectory files where read and a frame extracted from there? is it only in the case of ACEMD? Would it be possible if the trajectory is written in ASCII format?

Many thanks,

Eric

On 14 February 2017 at 11:12, Stefan Doerr notifications@github.com wrote:

Oh sorry, no nc are trajectory files right? What are the coordinate file extensions for AMBER?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/255#issuecomment-279679142, or mute the thread https://github.com/notifications/unsubscribe-auth/AOjRBFUSSwNr4AjriYm8_AZdpqE-GeImks5rcYwOgaJpZM4L_eyy .

--

Eric Lang

BrisSynBio Postdoctoral Research Associate Modelling Centre for Computational Chemistry School of Chemistry - University of Bristol Bristol BS8 1TS - United Kingdom

stefdoerr commented 7 years ago

@j3mdamas I don't get the problem. He will pass the SlurmQueue object to ad.app.

@eric-jm-lang Ok so scratch what I wrote earlier. It should be input.rst. The problem is I don't know if we can write rst7 files. I know we can write rst files. Was there a version change? So, the adaptive will read your trajectory nc files, select from them a frame and write that frame as an rst file in the copied simulation folder to start a new simulation from that coordinate

eric-jm-lang commented 7 years ago

Ok, thanks. No rst and rst7 should be the same. Ok so indeed the adaptive will read the trajectory file and select a frame and write it to a rst file. got it. I will try to do this and see how it goes. Thanks a lot Eric

On 14 February 2017 at 11:48, Stefan Doerr notifications@github.com wrote:

@j3mdamas https://github.com/j3mdamas I don't get the problem. He will pass the SlurmQueue object to ad.app.

@eric-jm-lang https://github.com/eric-jm-lang Ok so scratch what I wrote earlier. It should be input.rst7. The problem is I don't know if we can write rst7 files. I know we can write rst files. Was there a version change? So, the adaptive will read your trajectory nc files, select from them a frame and write that frame as an rst file in the copied simulation folder to start a new simulation from that coordinate

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/255#issuecomment-279686946, or mute the thread https://github.com/notifications/unsubscribe-auth/AOjRBA1PuKyLSE4kcFDq7XPsusbCKG2_ks5rcZR1gaJpZM4L_eyy .

--

Eric Lang

BrisSynBio Postdoctoral Research Associate Modelling Centre for Computational Chemistry School of Chemistry - University of Bristol Bristol BS8 1TS - United Kingdom

stefdoerr commented 7 years ago

Show me the run.sh file once you have it. Remember to also set adaptive to write the correct fileformat as I showed you before. The rest should then work

j3mdamas commented 7 years ago

@stefdoerr then I think the job.sh will be overwritten by _createJobScript, no?

stefdoerr commented 7 years ago

Yes? As I said he should only write a run.sh script.

j3mdamas commented 7 years ago

I see, I misinterpreted then.

Well, @eric-jm-lang, check https://www.htmd.org/docs/latest/htmd.queues.slurmqueue.html#module-htmd.queues.slurmqueue to see if the sbatch options you want are available when setting up your AdaptiveMD.app. For example, we don't use the account or the ntasks per node.

piia600 commented 7 years ago

I guess the PBS queue is not working in version 1.5.17?

Traceback (most recent call last):
  File "equil_gpu.py", line 8, in <module>
    app = PBSQueue()
NameError: name 'PBSQueue' is not defined

My script:

from htmd import *
from htmd.protocols.equilibration_v2 import Equilibration
md = Equilibration()
md.runtime = 5
md.timeunits = 'ns'
md.temperature = 310
md.write('./build/','./equil')
app = PBSQueue()
app.queue = 'gpu'
app.ngpu = 1
app.walltime = 14000
app.submit('./equil/')
app.wait()

j3mdamas commented 7 years ago

We haven't exposed it yet, due to being heavily untested.

You can import it using:

from htmd.queues.pbsqueue import PBSQueue

I'll add it now, and see if someone is willing to test it for us.

piia600 commented 7 years ago

I wrote my own job.sh script, which seems to work in our PBS queuing system. Is there a way to make HTMD just to copy this file to the folders & run it? Or is it always implied in the queues that you need HTMD to write the job.sh files?

BTW, as expected the PBSQueue does not work.

#!/bin/bash
#PBS -N AceMD
#PBS -q gpu@arien-pro.ics.muni.cz
#PBS -l select=1:ncpus=1:ngpus=1:mem=5gb:scratch_local=5gb:cl_doom=True
#PBS -l walltime=24:0:0

cd $SCRATCHDIR || exit 1
cp -r $PBS_O_WORKDIR/* $SCRATCHDIR/

./run.sh

cp -r $SCRATCHDIR/* $PBS_O_WORKDIR/. && rm -rf $SCRATCHDIR

cd $PBS_O_WORKDIR
sim=`basename $PBS_O_WORKDIR`
simpath=`dirname $(dirname $PBS_O_WORKDIR)`
mv *.xtc $simpath/data/$sim

exit 0

stefdoerr commented 7 years ago

The queues always write their own job.sh files but we could just take your options and add them to the class. The rest of the logic which I see in the script can probably be moved to the run.sh file which is called by our job.sh file and is not modified by HTMD

stefdoerr commented 7 years ago

The only two options which you have and are not supported by our PBSqueue are scratch_local and cl_doom. I will add these now. The rest of the script (everything under #PBS commands) you can move to a run.sh script.

j3mdamas commented 7 years ago

md = PBSQueue
md.jobname = 'AceMD'
md.queue = 'gpu@arien-pro.ics.muni.cz'
md.ncpus = 1
md.ngpus = 1
md.memory = '5000'
md.walltime = '86400'

Plus the options Stefan is adding.

stefdoerr commented 7 years ago

https://github.com/Acellera/htmd/commit/40ae0550b4dc76cd18e4a9451fd4e052d229bd28 Done. You can pull the latest HTMD from github or wait for the next release. So as Joao said, this should work (added my own parameters):

md = PBSQueue
md.jobname = 'AceMD'
md.queue = 'gpu@arien-pro.ics.muni.cz'
md.ncpus = 1
md.ngpus = 1
md.memory = 5000
md.walltime = 86400
md.cluster = 'doom'
md.scratch_local = 5000

and move the rest of the logic into the run.sh

piia600 commented 7 years ago

Do you have any idea when the next release is happening?

j3mdamas commented 7 years ago

With the cluster and scratch_local variables? probably just in one week.

j3mdamas commented 7 years ago

Just a note, a warning exists on the documentation of PBSQueue: https://www.htmd.org/docs/latest/htmd.queues.pbsqueue.html

jeiros commented 7 years ago

Hi,

I'm going to use this thread as I'm also using AMBER. I'm not using any queuing system, just trying to run the simulations straight from the machine which has the GPUs. If you want I can open this on a different issue but I though it could be useful for other AMBER users.

I haven't used htmd for awhile and I guess the API has changed. When we wrote the PmemdLocal class app I was able to run adaptive simulations on a directory with the following structure:

.
├── adaptive.ipynb
├── ready
│   ├── MD.sh
│   ├── Production.in
│   ├── structure.prmtop
│   └── structure.rst
├── structure.prmtop
└── structure.rst

By running the following commands:

adapt = htmd.AdaptiveRun()
adapt.nmin = 3
adapt.nmax = 4
adapt.nepochs = 10
adapt.metricsel1 = 'name CA'
adapt.metrictype = 'distances'
adapt.ticadim = 5
adapt.updateperiod = 7200
adapt.filtersel = 'protein'
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data')
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run()

I've seen now that the AdaptiveRun class is not available anymore so I've changed this a bit, to give:

adapt = htmd.AdaptiveMD()
adapt.nmin = 3
adapt.nmax = 4
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG', sel2='name CA')
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data',
    devices=[0, 1, 2, 3])
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run()

But this crashes out after the first epoch is finished with the following traceback:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-15-fd22e54018ed> in <module>()
     13 adapt.datapath = './data'
     14 adapt.filteredpath = './filtered'
---> 15 adapt.run()

/home/je714/htmd/htmd/adaptive/adaptive.py in run(self)
     97                 # If currently running simulations are lower than nmin start new ones to reach nmax number of sims
     98                 if self._running <= self.nmin and epoch < self.nepochs:
---> 99                     flag = self._algorithm()
    100                     if flag is False:
    101                         self._unsetLock()

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _algorithm(self)
    123 
    124     def _algorithm(self):
--> 125         data = self._getData(self._getSimlist())
    126         if not self._checkNFrames(data): return False
    127         self._createMSM(data)

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _getSimlist(self)
    140         logger.info('Postprocessing new data')
    141         sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.pdb')),
--> 142                        glob(path.join(self.inputpath, '*', '')))
    143         if self.filter:
    144             sims = simfilter(sims, self.filteredpath, filtersel=self.filtersel)

/home/je714/htmd/htmd/simlist.py in simlist(datafolders, molfiles, inputfolders)
    134         raise FileNotFoundError('No data folders were given, check your arguments.')
    135     if not molfiles:
--> 136         raise FileNotFoundError('No molecule files were given, check your arguments.')
    137     if isinstance(molfiles, str):
    138         molfiles = [molfiles]

FileNotFoundError: No molecule files were given, check your arguments.

And the following directory structure:

.
├── adaptive.ipynb
├── data
│   ├── e1s1_ready
│   │   └── Production.nc
│   ├── e1s2_ready
│   │   └── Production.nc
│   ├── e1s3_ready
│   │   └── Production.nc
│   └── e1s4_ready
├── input
│   ├── e1s1_ready
│   │   ├── log.txt
│   │   ├── mdinfo
│   │   ├── MD.sh
│   │   ├── Production.in
│   │   ├── Production_new.rst
│   │   ├── Production.out
│   │   ├── structure.prmtop
│   │   └── structure.rst
│   ├── e1s2_ready
│   │   ├── log.txt
│   │   ├── mdinfo
│   │   ├── MD.sh
│   │   ├── Production.in
│   │   ├── Production_new.rst
│   │   ├── Production.out
│   │   ├── structure.prmtop
│   │   └── structure.rst
│   ├── e1s3_ready
│   │   ├── log.txt
│   │   ├── mdinfo
│   │   ├── MD.sh
│   │   ├── Production.in
│   │   ├── Production_new.rst
│   │   ├── Production.out
│   │   ├── structure.prmtop
│   │   └── structure.rst
│   └── e1s4_ready
│       ├── log.txt
│       ├── MD.sh
│       ├── Production.in
│       ├── Production.out
│       ├── structure.prmtop
│       └── structure.rst
├── ready
│   ├── MD.sh
│   ├── Production.in
│   ├── structure.prmtop
│   └── structure.rst
├── structure.prmtop
└── structure.rst

What's the recommended way to be running simple adaptive runs?

Also, side note: when the script is running, I don't see any logging information that should be coming out of logger.info.

stefdoerr commented 7 years ago

Thanks for the complete report!

The reason it crashes is because it can't find pdb files in the input folders. Admittedly I could probably change it to accept prmtop files or psf files as well since we have the reader functionallity but for the moment simlist in adaptive expects to find pdb files as you can see in the glob(path.join(self.inputpath, '*', 'structure.pdb')) line

jeiros commented 7 years ago

Thanks! So do I have to manually copy the PDB files in the input folders?

Edit: Looks like they're copied automatically if the PDB exists in the root folder. I'll report back when/if it crashes 😄

stefdoerr commented 7 years ago

For now yes. Or symlink them like I usually do to not waste disk space

On March 10, 2017 11:06:06 AM GMT+01:00, Juan Eiros notifications@github.com wrote:

Thanks! So do I have to manually copy the PDB files in the input folders?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Acellera/htmd/issues/255#issuecomment-285628934

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

jeiros commented 7 years ago

Hi, sorry to keep bothering you with this :(

Having the structure.pdb allowed for the filtering and projection steps to happen, but I think there's a problem when using PDB files for systems that have more than 99999 atoms. My system has ~300k atoms, and I am getting the following log and error after issuing the adaptive run like this:

adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG', sel2='name CA')
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data',
    devices=[0, 1, 2, 3])
adapt.filtersel = 'not water'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run()

Creating simlist: 100% (3/3) [#####################################] eta 00:01 |
2017-03-10 11:14:50,598 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.
2017-03-10 11:14:57,215 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
Filtering trajectories: 100% (3/3) [###############################] eta 00:01 -
2017-03-10 11:15:08,539 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.
2017-03-10 11:15:08,655 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.
2017-03-10 11:15:08,657 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.
2017-03-10 11:15:13,857 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
2017-03-10 11:15:14,007 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
2017-03-10 11:15:14,008 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
2017-03-10 11:15:39,500 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
2017-03-10 11:15:39,826 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
2017-03-10 11:15:42,620 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.
2017-03-10 11:16:16,264 - htmd.molecule.readers - WARNING - Non-integer values were read from the PDB "serial" field. Dropping PDB values and assigning new ones.
2017-03-10 11:16:20,186 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.
Projecting trajectories: 100% (3/3) [##############################] eta 00:01 |
2017-03-10 11:16:41,076 - htmd.projections.metric - WARNING - Error in simulation with id: 0. "xyz must be shape (Any, 205575, 3). You supplied  (20, 302263, 3)"
2017-03-10 11:16:41,323 - htmd.projections.metric - WARNING - Error in simulation with id: 2. "xyz must be shape (Any, 205575, 3). You supplied  (20, 302263, 3)"
2017-03-10 11:16:41,341 - htmd.projections.metric - WARNING - Error in simulation with id: 1. "xyz must be shape (Any, 205575, 3). You supplied  (20, 302263, 3)"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-37-c99ec0c045c4> in <module>()
     14 adapt.datapath = './data'
     15 adapt.filteredpath = './filtered'
---> 16 adapt.run()

/home/je714/htmd/htmd/adaptive/adaptive.py in run(self)
     97                 # If currently running simulations are lower than nmin start new ones to reach nmax number of sims
     98                 if self._running <= self.nmin and epoch < self.nepochs:
---> 99                     flag = self._algorithm()
    100                     if flag is False:
    101                         self._unsetLock()

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _algorithm(self)
    123 
    124     def _algorithm(self):
--> 125         data = self._getData(self._getSimlist())
    126         if not self._checkNFrames(data): return False
    127         self._createMSM(data)

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _getData(self, sims)
    154         if self.ticadim > 0:
    155             # tica = TICA(metr, int(max(2, np.ceil(self.ticalag))))  # gianni: without project it was tooooo slow
--> 156             data = metr.project()
    157             data.dropTraj()  # Drop before TICA to avoid broken trajectories
    158             ticalag = int(

/home/je714/htmd/htmd/projections/metric.py in project(self)
    162 
    163         # Removing empty trajectories
--> 164         metrics, ref, updlist, fstep = self._removeEmpty(metrics, ref, deletesims, fstep)
    165 
    166         # Constructing a MetricData object

/home/je714/htmd/htmd/projections/metric.py in _removeEmpty(self, metrics, ref, deletesims, fstep)
    195 
    196         if len(metrics) == 0:
--> 197             raise NameError('No trajectories were projected. Check if the simlist is empty or for projection errors.')
    198 
    199         return metrics, ref, updlist, fstep

NameError: No trajectories were projected. Check if the simlist is empty or for projection errors.

This gives a very weird looking filtered.pdb file in the filtered/ folder, with not all the water molecules having been removed.

The NameError suggests to check for projection errors. The metric I am using is distances between the alpha carbons and my ligand atoms. I've got 419 CA atoms and 57 atoms in my ligand, so 23883 distances in total. I've checked this with:

mol = htmd.Molecule('structure.prmtop')
proj = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG', sel2='name CA')
len(proj.getMapping(mol))
23883

So the projection seems to be working fine. I am a bit confused by these messages in the log: htmd.projections.metric - WARNING - Error in simulation with id: 1. "xyz must be shape (Any, 205575, 3). You supplied (20, 302263, 3)". None of the expected or supplied shapes match the metric shape (?)

Any idea on what I might be looking into to fix this?

Edit: I can also create a simlist fine:

htmd.simlist(datafolders=glob('./data/*'), molfiles='structure.prmtop',
                     inputfolders=glob('./input/*'))

Creating simlist: 100% (3/3) [#####################################] eta 00:01 -
Out[59]:
array([ simid = 0
parent = None
input = ./input/e1s1_ready
trajectory = ['./data/e1s1_ready/Production.nc']
molfile = structure.prmtop
numframes = [20],
       simid = 1
parent = None
input = ./input/e1s2_ready
trajectory = ['./data/e1s2_ready/Production.nc']
molfile = structure.prmtop
numframes = [20],
       simid = 2
parent = None
input = ./input/e1s3_ready
trajectory = ['./data/e1s3_ready/Production.nc']
molfile = structure.prmtop
numframes = [20]], dtype=object)

stefdoerr commented 7 years ago

Don't worry about posting issues. Helps us improve. In this case I am very sure the error comes from MDtraj when trying to read the nc files. I would bet that your pdb files contain less/more atoms than the nc files. Try:

mol = Molecule('./input/e1s2_ready/structure.pdb')
mol.read('./data/e1s2_ready/Production.nc')

That should give you the same error.

stefdoerr commented 7 years ago

Most of the 99999 atom messages you can ignore. They are warnings. I am a bit worried about overflowing segids. That gives me the feeling that it might not be an overflowing segid but instead your atomnumbers pushing all fields to the side which is a big NO-NO in the PDB format. Be very careful with column alignment in PDB files. Who wrote that PDB file? Can I have one to look at maybe?

jeiros commented 7 years ago

Hi, your suggestion

mol = Molecule('./input/e1s2_ready/structure.pdb')
mol.read('./data/e1s2_ready/Production.nc')

gave the same warnings:

2017-03-10 15:04:07,490 - htmd.molecule.readers - WARNING - Reading PDB file with more than 99999 atoms. Bond information can be wrong.
2017-03-10 15:04:09,722 - htmd.molecule.writers - WARNING - Field "segid" of PDB overflows. Your data will be truncated to 4 characters.

The pdb comes from AmberTool's cpptraj, I just did: cpptraj -p structure.prmtop -y structure.rst -x structure.pdb. Side note: the prmtop file was done with tleap and then ran through parmed to apply HMR to it (so the masses of H atoms and atoms bound to them are changed).

structure.pdb.zip

stefdoerr commented 7 years ago

Ok, the structure is fine. You can ignore these warnings.

2017-03-10 11:16:41,076 - htmd.projections.metric - WARNING - Error in simulation with id: 0. "xyz must be shape (Any, 205575, 3). You supplied (20, 302263, 3)"

This is the only real error here.

My feeling is that something went wrong during filtering. Could you send me the filtered pdb and one filtered nc file?

stefdoerr commented 7 years ago

Or if you want try reading them manually:

mol = Molecule('./filtered/filtered.pdb')
mol.read('./filtered/e1s2_ready/Production.filtered.nc')

stefdoerr commented 7 years ago

By the way which HTMD version are you on? I remember an old bug that gave that error.

stefdoerr commented 7 years ago

Sorry, my bad. Found the bug. It was in the filtering/writing with MDtraj. I was super-convinced I had fixed it before. Maybe a regression. I'll add a test this time. https://github.com/Acellera/htmd/commit/f5a5d2cabfb04919fc2046d6d10e663905ad5285

A new release is in the making. Once this is green it's out: https://travis-ci.org/Acellera/htmd/builds/209798626

jeiros commented 7 years ago

I think I was one version behind. I'll wait for the new build 👍 Thanks for taking a look into it! I'll report back after the weekend to see if it works 😄

jeiros commented 7 years ago

Also I tried manually fixing it by cloning the repo and setting the PYTHONPATH to it. Changed the following in adaptiveMD class:

sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.prmtop')),
                       glob(path.join(self.inputpath, '*', '')))

And now the error is still hapenning but the warnings are a bit different:

Creating simlist: 100% (3/3) [#####################################] eta 00:01 |
Filtering trajectories: 100% (3/3) [###############################] eta 00:01 -
Projecting trajectories: 100% (3/3) [##############################] eta 00:01 |
2017-03-10 16:14:50,115 - htmd.projections.metric - WARNING - Error in simulation with id: 2. "xyz must be shape (Any, 7414, 3). You supplied  (20, 302263, 3)"
2017-03-10 16:14:50,123 - htmd.projections.metric - WARNING - Error in simulation with id: 0. "xyz must be shape (Any, 7414, 3). You supplied  (20, 302263, 3)"
2017-03-10 16:14:50,133 - htmd.projections.metric - WARNING - Error in simulation with id: 1. "xyz must be shape (Any, 7414, 3). You supplied  (20, 302263, 3)"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-c99ec0c045c4> in <module>()
     14 adapt.datapath = './data'
     15 adapt.filteredpath = './filtered'
---> 16 adapt.run()

/home/je714/htmd/htmd/adaptive/adaptive.py in run(self)
     97                 # If currently running simulations are lower than nmin start new ones to reach nmax number of sims
     98                 if self._running <= self.nmin and epoch < self.nepochs:
---> 99                     flag = self._algorithm()
    100                     if flag is False:
    101                         self._unsetLock()

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _algorithm(self)
    123 
    124     def _algorithm(self):
--> 125         data = self._getData(self._getSimlist())
    126         if not self._checkNFrames(data): return False
    127         self._createMSM(data)

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _getData(self, sims)
    154 
    155         if self.ticadim > 0:
--> 156             # tica = TICA(metr, int(max(2, np.ceil(self.ticalag))))  # gianni: without project it was tooooo slow
    157             data = metr.project()
    158             data.dropTraj()  # Drop before TICA to avoid broken trajectories

/home/je714/htmd/htmd/projections/metric.py in project(self)
    162 
    163         # Removing empty trajectories
--> 164         metrics, ref, updlist, fstep = self._removeEmpty(metrics, ref, deletesims, fstep)
    165 
    166         # Constructing a MetricData object

/home/je714/htmd/htmd/projections/metric.py in _removeEmpty(self, metrics, ref, deletesims, fstep)
    195 
    196         if len(metrics) == 0:
--> 197             raise NameError('No trajectories were projected. Check if the simlist is empty or for projection errors.')
    198 
    199         return metrics, ref, updlist, fstep

NameError: No trajectories were projected. Check if the simlist is empty or for projection errors.

I think the stripping of water is now working fine, but the filtered.pdb file has all coordinates set to 0 ( I think because it's reading from a prmtop which has no coordinate information).

filtered.pdb.zip

Also I'm not very sure that the changes that I write on the local repo are actually picked when I run htmd, but I guess they have since the behaviour changed.

stefdoerr commented 7 years ago

You need to delete the filtered folder. Also I introduced another bug by fixing this one but you don';t seem to have hit it so keep trying :D

stefdoerr commented 7 years ago

Ok 1.7.11 should go through now: https://travis-ci.org/Acellera/htmd/builds/209820965 Crossing fingers

stefdoerr commented 7 years ago

@jeiros can you check if it's fixed in 1.7.11?

jeiros commented 7 years ago

Hi @stefdoerr,

I upgraded htmd to 1.7.11 but still the error prevails:

Creating simlist: 100% (3/3) [#####################################] eta 00:01 -
Filtering trajectories: 100% (3/3) [###############################] eta 00:01 |
Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
Field "serial" of PDB overflows. Your data will be truncated to 5 characters.
Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
Field "serial" of PDB overflows. Your data will be truncated to 5 characters.
Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
Field "serial" of PDB overflows. Your data will be truncated to 5 characters.
Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
Field "serial" of PDB overflows. Your data will be truncated to 5 characters.
Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
Field "serial" of PDB overflows. Your data will be truncated to 5 characters.
Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
Field "serial" of PDB overflows. Your data will be truncated to 5 characters.
Projecting trajectories: 100% (3/3) [##############################] eta 00:01 -
Error in simulation with id: 2. "xyz must be shape (Any, 7414, 3). You supplied  (20, 302263, 3)"
Error in simulation with id: 0. "xyz must be shape (Any, 7414, 3). You supplied  (20, 302263, 3)"
Error in simulation with id: 1. "xyz must be shape (Any, 7414, 3). You supplied  (20, 302263, 3)"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-c99ec0c045c4> in <module>()
     14 adapt.datapath = './data'
     15 adapt.filteredpath = './filtered'
---> 16 adapt.run()

/home/je714/.local/lib/python3.5/site-packages/htmd-0.15.11-py3.5.egg/htmd/adaptive/adaptive.py in run(self)
     97                 # If currently running simulations are lower than nmin start new ones to reach nmax number of sims
     98                 if self._running <= self.nmin and epoch < self.nepochs:
---> 99                     flag = self._algorithm()
    100                     if flag is False:
    101                         self._unsetLock()

/home/je714/.local/lib/python3.5/site-packages/htmd-0.15.11-py3.5.egg/htmd/adaptive/adaptiverun.py in _algorithm(self)
    123 
    124     def _algorithm(self):
--> 125         data = self._getData(self._getSimlist())
    126         if not self._checkNFrames(data): return False
    127         self._createMSM(data)

/home/je714/.local/lib/python3.5/site-packages/htmd-0.15.11-py3.5.egg/htmd/adaptive/adaptiverun.py in _getData(self, sims)
    154         if self.ticadim > 0:
    155             # tica = TICA(metr, int(max(2, np.ceil(self.ticalag))))  # gianni: without project it was tooooo slow
--> 156             data = metr.project()
    157             data.dropTraj()  # Drop before TICA to avoid broken trajectories
    158             ticalag = int(

/home/je714/.local/lib/python3.5/site-packages/htmd-0.15.11-py3.5.egg/htmd/projections/metric.py in project(self)
    162 
    163         # Removing empty trajectories
--> 164         metrics, ref, updlist, fstep = self._removeEmpty(metrics, ref, deletesims, fstep)
    165 
    166         # Constructing a MetricData object

/home/je714/.local/lib/python3.5/site-packages/htmd-0.15.11-py3.5.egg/htmd/projections/metric.py in _removeEmpty(self, metrics, ref, deletesims, fstep)
    195 
    196         if len(metrics) == 0:
--> 197             raise NameError('No trajectories were projected. Check if the simlist is empty or for projection errors.')
    198 
    199         return metrics, ref, updlist, fstep

NameError: No trajectories were projected. Check if the simlist is empty or for projection errors.

Also I think with the new version the logging isn't working:

import htmd
HTMD: Logging setup failed

stefdoerr commented 7 years ago

@jeiros Sorry for making you do so much testing. Could you please instead pass me a single data/X and input/X folder so that I can test it locally? Only one trajectory/input should be enough. Thanks!

jeiros commented 7 years ago

@stefdoerr Don't worry, I understand supporting AMBER engine is extra work. I actually set up the adaptive.filter = False and it went on to do the projections. But it failed after awhile with a really long traceback coming from joblib, I think. I'm retrying this with longer simulations, since I thought maybe it isn't possible to build a markov model with the test runs that I am doing (which are only a few ps long).

I'll send you the requested files for testing to your email via file exchange of Imperial, since I can't upload them here.

stefdoerr commented 7 years ago

Yes it's definitely filtering related but it's probably a very simple fix once I have the files. Thanks very much!

If it's ok with you I might add those files (or parts of them) as HTMD tests to avoid future regressions as well.

stefdoerr commented 7 years ago

@jeiros The bug is fixed actually. You just need to delete your filtered folder. I made a new filtered folder (filteredS) for example and compared it with yours (filtered):

In [7]: mol = Molecule('./filteredS/filtered.pdb')

In [9]: mol.read('./filteredS/e1s1_ready/Production.filtered.nc')

In [10]: mol = Molecule('./filtered/filtered.pdb')

In [11]: mol.read('./filtered/e1s1_ready/Production.filtered.nc')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-9f28e3f3679c> in <module>()
----> 1 mol.read('./filtered/e1s1_ready/Production.filtered.nc')

I will check if the rest runs fine though

stefdoerr commented 7 years ago

Ok, there is a bug though when working with your PDB file https://github.com/Acellera/htmd/issues/276 I will need to see what's wrong there. If you modify my adaptive code to read prmtop files though, as you did from what I see, it will work, albeit extremely slowly which I don't know exactly why.

jeiros commented 7 years ago

Ok no problem thanks for taking a look into it :)

stefdoerr commented 7 years ago

@jeiros ok this works perfectly on my computer after the bug fixes

app = htmd.LocalGPUQueue()
app.devices = [0,]

adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run()

Bug fixes:

Had to read the topology because MDtraj doesn't like reading pure trajectory files. Fixed here: https://github.com/Acellera/htmd/commit/656d0a86352f4b42114053918a1bfa11d8ea8949 and here: https://github.com/Acellera/htmd/commit/24ee39fe3c8ccff9c8852e7f91c492b5a230e7bb. New release 1.7.13 being built right now https://travis-ci.org/Acellera/htmd/builds/210582795

Minor changes for improvements:

I removed the ions during filtering since we don't need them and they might differ in number between systems.
I selected only 2-3 atoms of the ligand to drastically reduce computation time and memory usage by using a selection like: 'resname LIG and name C6 C10 C19'. This should be a good enough proxy of the conformations of the ligand and reduces the dimensionality from 30k dimensions to 1257 dimensions (1/30th memory usage and much faster calculation).
I changed the default input.coor of adaptive to structure.ncrst since that is what your simulation program probably needs for input coordinates. Take care, you need to change your input file to read ncrst instead of rst. MDtraj cannot write rst, I just assumed here that ncrst is the correct equivalent extension but maybe it's rst7? I have no clue about this so you should enlighten me.

This is what it looks like after execution

[sdoerr@loro Mon15:32 test-htmd] ll input/e2s1_e1s1p0f2/
total 82544
-rw-rw---- 1 sdoerr lab        0 Mar 13 12:43 log.txt
-rw-rw---- 1 sdoerr lab      561 Mar 13 12:43 mdinfo
-rw-rw---- 1 sdoerr lab      150 Mar 13 12:43 MD.sh
-rw-rw---- 1 sdoerr lab      230 Mar 13 12:43 Production.in
-rw-rw---- 1 sdoerr lab  7255212 Mar 13 15:32 structure.ncrst
-rw-r----- 1 sdoerr lab 27250369 Mar 10 15:06 structure.pdb
-rw-r----- 1 sdoerr lab 50002440 Mar 10 10:22 structure.prmtop

Remaining issues:

I need to create the new simlist class which autodetects topology files such as *.prmtop in adaptive folders instead of expecting a structure.pdb file as defined in this issue https://github.com/Acellera/htmd/issues/275. For the moment you seem to have worked around it by changing the line in adaptiverun.py if I understand correctly.

# Original line in adaptiverun.py
sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.pdb')),
                       glob(path.join(self.inputpath, '*', '')))
# Needs to be currently replaced by
sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.prmtop')),
                       glob(path.join(self.inputpath, '*', '')))

Once 1.7.13 is out test it and tell me if you encounter any other problems. Other problems would probably be from PMEMD App since I can't test that. But the rest runs fine locally.

Edit: Remember to delete the filtered folder after you install the new HTMD version as I mentioned before because if the files already exist it won't overwrite them.

jeiros commented 7 years ago

Thank you so much! There are actually different versions of 'restart' files in AMBER, in ASCII and binary version. Mine is binary. It can be read with mdtraj:

md.load_ncrestrt('structure.rst', top='structure.prmtop')
<mdtraj.Trajectory with 1 frames, 302263 atoms, 99238 residues, and unitcells at 0x7fe587d10588>

I thought the convention was to name them .rst but I do have seen people saving them as .rst7, but the file termination shouldn't affect what's inside the file?

Thank you so much for this! Also, I was going crazy with the counterions stripping, didn't know that I had to scape the + and - characters 😅

Acellera / htmd