Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
253 stars 58 forks source link

Problems running adaptive MD script #1041

Open smar966 opened 1 year ago

smar966 commented 1 year ago

Dear support team,

I am trying to run adaptive MDs after a long time and failing. I am processing epoch 1 and it should be fast. However, the adaptive.py script is (repeatedly) taking a long time to finish (> 1h 20'), and when it finished it did not succeed.

It reports many warnings, repeatedly creates simlist and filters trajectories, and has many lines like the one below: 2022-11-15 09:58:24,075 - numexpr.utils - INFO - Note: NumExpr detected 112 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

Can you please help me solve the problem? I am attaching the complete log file and post the adaptive.py script below. It may also be relevant that I am using HTMD3 over a singularity Thank you in advance! Sergio

adaptive.log


adaptive.py:

from htmd.ui import *
htmd.config.config(njobs=2)
md = AdaptiveMD()
md.nmin=2
md.nmax=10
md.nepochs = 20

m = Molecule('generators/0/structure.pdb') 
m.filter('not water') 
m.write('reference.pdb')
md.projection = MetricRmsd(trajrmsdstr='protein and name CA', refmol=Molecule('reference.pdb'))

md.ticadim = 1
md.app = LocalGPUQueue()
md.dryrun = True
md.run()
stefdoerr commented 1 year ago

Hi, how are you running this script? In a crontab maybe?

In any case, set htmd.config.config(njobs=2, ncpus=1) so that it doesn't parallelize anything just to be sure

smar966 commented 1 year ago

Hi Stefan. Your suggestion did not change much. Now, in the log file I'm getting the warning: "The ncpus config option has been renamed to njobs. Please use njobs instead."

To run the script I specify the singularity file, and then execute it it like this: ####### HTMD3=/storage/brno1-cerit/home/sergio/HTMD3_home/htmd_2022_10_19.sif singularity exec -H pwd $HTMD3 python adaptive.py > adaptive.log 2>&1

stefdoerr commented 1 year ago

ok could you set njobs then to 1? I have the feeling singularity is not allowing it to create new processes so let's just run it all single threaded

smar966 commented 1 year ago

Unfortunately, either specifying 1 or 2 CPUs (njobs) the results are the same. :( The adaptive script works if I use HTMD2, but then the MD jobs from the new epoch all crash, I believe due some incompatibility...

stefdoerr commented 1 year ago

What versions are HTMD2 / HTMD3?

smar966 commented 1 year ago

HTMD2 is a very old version from 4+ years ago (available to me via "module"). HTMD3 is a recent version from October 2022, operating via singularity

stefdoerr commented 1 year ago

Can you upload here the latest log file from when we specified single njobs/ncpu?

smar966 commented 1 year ago

Sure. Here is one of my many attempts adaptive.3.log

stefdoerr commented 1 year ago

The errors which draw my attention are the following two:

htmd.projections.metric - WARNING - Error while projecting simulation id: 2. "Cannot align molecules. The two selections produced different number of atoms"

and

  File "/mnt/storage-brno1-cerit/nfs4/home/sergio/apoE/apoE4-dimer_6NCO+lig_adaptive/adaptive_meta.py", line 16, in <module>
    md.projection = MetricRmsd(trajrmsdstr='protein and name CA', refmol=Molecule('reference.pdb'))
  File "/opt/conda/lib/python3.9/site-packages/moleculekit/molecule.py", line 343, in __init__
    self.read(filename, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/moleculekit/molecule.py", line 1189, in read
    mol = rr(fname, frame=frame, topoloc=tmppdb, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/moleculekit/readers.py", line 1185, in PDBread
    sequenceID(parsedtopo.serial) + 1
  File "/opt/conda/lib/python3.9/site-packages/moleculekit/util.py", line 238, in sequenceID
    raise RuntimeError("An empty array was passed to sequenceID")
RuntimeError: An empty array was passed to sequenceID

I think your reference.pdb file is broken. Can you try just loading it with:

from moleculekit.molecule import Molecule
mol = Molecule("reference.pdb")

It will probably throw the same error

smar966 commented 1 year ago

reference.pdb seems fine and that code did not generate any error. But the molecules all trajectories from epoch 1 are "broken", in the sense of the PBC. I thought this was due to some change in the default setting in htmd3, which would not wrap the molecules in one periodic box. Do you think this could be source of the problem?

stefdoerr commented 1 year ago

I don't see how that could be the case because the error is quite specific that it cannot read the reference.pdb. I am a bit suspicious because you use a relative path to read the file so it might not be the reference.pdb file you tried which is giving the error if there are multiple ones?

What you consider broken in sense of PBC is probably just unwrapped trajectories. Can you load one in VMD and send me a screenshot if it's not confidential? That's not an issue and should not affect adaptive sampling. It calculates wrapped distances internally.

smar966 commented 1 year ago

Hi. No, the reference.pdb seems fine. The simulations are definitely unwrapped. I don't know why, because every time before they were wrapped by default. Unless something changed in the new defaults of the new HTMD. But should this be a problem? Anyway, in the link below you can find one of the filtered MDs, the reference.pdb and the adaptive script that I'm using.

https://mega.nz/folder/d1owAT5J#nu6zd0K_rVZodnmlNzAaTA

stefdoerr commented 1 year ago

Hi again and sorry for the delay. Could you please try to run the script outside of a singularity container? Just create a new conda env with the latest HTMD version. I am quite sure that what happened is that multiple jobs started the same script, the one started writing the reference.pdb file, the second script tried to read the reference.pdb while the other one was still writing it and everything crashed and burned. If you manually execute from a console that adaptive_meta.py script I have a feeling it will work fine.

smar966 commented 1 year ago

Hi Stefan. The problem here is that I am using a grid environment on a supercomputer and we don't have permissions to install any software there. So I need to convince someone who is really busy to install HTMD with conda, and I'm not sure I will succeed. But I'll try.

stefdoerr commented 1 year ago

the alternative is finding out why your singularity container got started multiple times

smar966 commented 1 year ago

ok, I'll discuss with my colleague and I'll let you know. thank you!