Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
253 stars 58 forks source link

inconsistent results for ACEMD 2 and ACEMD 3 #1088

Open smar966 opened 2 months ago

smar966 commented 2 months ago

Dear support team,

A few months ago I moved from HTMD2 to HTMD3 (finally). However, I was getting surprising and inconsistent results, which made me suspect that something is running differently. To confirm this, I ran the adaptive sampling on the very same system (I used the exact same topology and adaptive script), only changing the input and the implementation. The behavior was radically different in the two calculations.

This makes me think that something in the default settings for running the MDs must have changed.

For the sake of comparing different systems in my project, I need to run MDs in the exact same conditions, but I would like to use the more recent HTMD3! Could you help me understand what changed, and how I can make HTMD3 run as previously HTMD2.

NOTE: I prepared the adaptive MD using the same python script (with production_v6). The respective input files for HTMD2 and HTMD3 look very different, and you can find them at the link: https://drive.google.com/drive/folders/1gxjq9OjqbCE86EI4PvhqCZfXUXzbQC49?usp=drive_link

Thank you in advance. Sergio

stefdoerr commented 2 months ago

You mean ACEMD not HTMD. I looked at the input files but they seem consistent to each other. Same settings in both. ACEMD3 is using OpenMM as the backend for the calculations while before it was using a different codebase. I don't believe that in production runs there should be any difference between them though other than luck in sampling. The two implementations were tested against each other and were equivalent.

smar966 commented 2 months ago

Hi Stefan. I was also very surprised. But trust me, the results I'm getting are suspiciously very different, at least for my systems. I am trying to observe the dissociation of a dimer. With HTMD2 (or ACEMD2), the dimer dissociated after ~1 us, while with HTMD3 it remained stable for 10 us (the end of it). The exact same system. I got suspicious after my first trial with HTMD3 on a different (but rather similar) system, which seemed overly stable - but for which I have no comparison in HTMD2. Is there any possibility that not everything is running exactly the same way?...

stefdoerr commented 2 months ago

The ACEMD version should not be the cause of this. Are we talking about a single dissociation event or multiple in different simulations?

Did you use the exact same parameters? Or did you re-build your system? Do a diff of the structure.prmtop files to be sure. Because maybe the AMBER version changed.

Also are you using the same Adaptive sampling method? If you updated HTMD together with ACEMD there might have been changes in the algorithm compared to the old attempt.

smar966 commented 2 months ago

Dear Stefan,

We are talking about the single dissociation of a dimer (which is a bit stochastic, true).

However, I compared the behavior with AceMD2 and AceMD3 for 2 systems already (different protein variants). The results were very similar: the dimers DID dissociate quite early with AceMD2 but NOT at all with AceMD3 in much longer simulation times.

The facts:

I have prepared a folder online for the two systems and for acemd2 and acemd3, containing: the inputs ('generators' folder), the adaptive.py script, and plots with the respective RMSDs vs. time. You can compare them directly here: https://drive.google.com/drive/folders/1gxjq9OjqbCE86EI4PvhqCZfXUXzbQC49?usp=sharing

I really don't know what else to compare and how to explain the different behaviors. The approximate dissociation time is a crucial information in this study, and I cannot proceed using AceMD3 without making sure that the differences found are due only to the systems and not the methods.

Thank you so much in advance for you help.

giadefa commented 2 months ago

Are you using the latest acemd version? We stopped calling it acemd3 quite a while ago. https://software.acellera.com/acemd/install.html

It might be hard to know what is different, but you should run all systems with the same code just for the sake of reproducibility.

g

On Thu, 2 May 2024 at 15:08, Sérgio M. @.***> wrote:

Dear Stefan,

We are talking about the single dissociation of a dimer (which is a bit stochastic, true).

However, I compared the behavior with AceMD2 and AceMD3 for 2 systems already (different protein variants). The results were very similar: the dimers DID dissociate quite early with AceMD2 but NOT at all with AceMD3 in much longer simulation times.

The facts:

  • I used the exact same topologies (as suggested, I double-checked using diff - no difference).
  • The adaptive.py scripts are slightly different but the adaptive settings should be exactly the same.
  • Only the 'input' files for the production MDs look different for AceMD2 and AceMD3. But they were generated using the same settings in protocol 6 ('from htmd.protocols.production_v6 import Production') under the respective AceMD version.

I have prepared a folder online for the two systems and for acemd2 and acemd3, containing: the inputs ('generators' folder), the adaptive.py script, and plots with the respective RMSDs vs. time. You can compare them directly here:

https://drive.google.com/drive/folders/1gxjq9OjqbCE86EI4PvhqCZfXUXzbQC49?usp=sharing

I really don't know what else to compare and how to explain the different behaviors. The approximate dissociation time is a crucial information in this study, and I cannot proceed using AceMD3 without making sure that the differences found are due only to the systems and not the methods.

Thank you so much in advance for you help.

— Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/1088#issuecomment-2090460662, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOUAD6LJDCSBHL3KWCTZAI3FZAVCNFSM6AAAAABGS3F3LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJQGQ3DANRWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

smar966 commented 2 months ago

Hi Gianni. We are using a quite recent implementation, installed last October. I understand what you mean. The only problem is that this is a long project, and I have most data generated with the old version. However, it has been deprecated and it can no longer run on the newer GPUs - so we are becoming more and more limited. This is why I'm trying to move to the newer version, but I'm struggling with these issues...

giadefa commented 2 months ago

if you are running the command acemd3, then you are actually running a very old one even if you have installed it recently.

On Fri, 3 May 2024 at 13:08, Sérgio M. @.***> wrote:

Hi Gianni. We are using a quite recent implementation, installed last October. I understand what you mean. The only problem is that this is a long project, and I have most data generated with the old version. However, it has been deprecated and it can no longer run on the newer GPUs - so we are becoming more and more limited. This is why I'm trying to move to the newer version, but I'm struggling with these issues...

— Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/1088#issuecomment-2092789475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOTJETITYX4ULBEWROLZANV3JAVCNFSM6AAAAABGS3F3LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJSG44DSNBXGU . You are receiving this because you commented.Message ID: @.***>

smar966 commented 2 months ago

Ohh really?!! Would it be different enough to provide incorrect MD simulations? this is important because most of my colleagues are currently running their adaptive simulations with 'acemd3'.

giadefa commented 2 months ago

No. It should be fine but it's definitely old.

G

On Mon, May 6, 2024, 14:41 Sérgio M. @.***> wrote:

Ohh really?!! Would it be different enough to provide incorrect MD simulations? this is important because most of my colleagues are currently running their adaptive simulations with 'acemd3'.

— Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/1088#issuecomment-2095929822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOVJZZAG3ARDEXCXNJTZA53A5AVCNFSM6AAAAABGS3F3LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVHEZDSOBSGI . You are receiving this because you commented.Message ID: @.***>

stefdoerr commented 2 months ago

I will insist though that the difference is not in ACEMD. The forcefield energies and forces have been compared between ACEMD2/3 and they were consistent and as far as I know no major bugs were fixed other than adding new features.

What is more probable to have changed is the adaptive sampling algorithm. If you can send us the code you used and also the HTMD versions from the conda environment it would be helpful.

To get the HTMD versions in the two conda environments do

conda activate env1
conda list htmd
conda list acemd
conda list acemd3

And the same for env2. Then I can take a look if anything significant has changed in the Adaptive classes.

smar966 commented 1 month ago

Hi Stefan, Thanks for your insights. If the adaptive protocol has changed, it could explain the differences I'm observing.

Unfortunately, because our group uses a grid environment, I am not using the regular conda installation but a singularity installation instead. Therefore, the commands 'conda activate env1' or 'conda activate env2' do not work for me. The other 'conda list xx' commands you requested produce:

conda list htmd # Name Version Build Channel htmd 2.3.2 py310_0 acellera htmd-deps 2.3.2 py310_0 acellera

conda list acemd # Name Version Build Channel acemd 3.7.2 cuda1122py310_0 acellera acemd3 3.7.2 0 acellera

conda list acemd3 # Name Version Build Channel acemd3 3.7.2 0 acellera

If you want to test my singularity installation, I have uploaded it to the following link: https://filesender.cesnet.cz/?s=download&token=5a346f1e-837b-4ec9-a3fd-ffc8151b87c6

To run the adaptive script I'm using the following commands: HTMD3=$LOCATION/htmd_2023_09.sif singularity exec -H `pwd` $HTMD3 python adaptive_meta.py > adaptive.log

And to run each MD, the command: singularity exec --nv -H `pwd` $HTMD3 acemd --ncpus 1 input >log.txt

Meanwhile, I am rerunin the adaptive sampling of one of the dimers I mentioned above using the 'acemd' to run MDs instead of 'acemd3'. But for the moment, I do not see any difference (no dissociation has happened yet)

stefdoerr commented 1 month ago

Can you run the same commands in the old container? So that I can compare the versions?

smar966 commented 1 month ago

Sure. For the old container, the results are:

conda list htmd # Name Version Build Channel htmd 1.13.10 py36_0 acellera htmd-data 0.1.hash23fb208 py_0 acellera htmd-deps 1.13.10 py36_0 acellera htmd-pdb2pqr 2.1.1+htmd.3 pyh13f2e89_0 acellera

conda list acemd # Name Version Build Channel acemd 2019.01.24 0 acellera acemd-examples 2016.5.12 1 acellera

conda list acemd3

Empty output

stefdoerr commented 1 month ago

can you also send me the adaptive_meta.py so I can see what settings and algorithm you are using? Thanks

smar966 commented 1 month ago

Sure. Please look into the two respective folders for acemd2 and acemd3 here: https://drive.google.com/drive/folders/1nsYxrwVTDWJVSWmaD0H1Cx8FkyMUaMiq

stefdoerr commented 1 month ago

I did a diff of the two codes. The only things which changed in Adaptive code was a fix for NPT simulations and maintaining their box size, but I assume your production runs are NVT, and that the MSM code changed from pyemma to deeptime.

There was no dramatic change in the algorithm as far as I can tell, unless somehow the deeptime switch created different Markov models which were significant enough to affect the results but I haven't had such experience when moving between the two libraries at least for MSM analysis.

Meanwhile, I am rerunning the adaptive sampling of one of the dimers I mentioned above using the 'acemd' to run MDs instead of 'acemd3'. But for the moment, I do not see any difference (no dissociation has happened yet)

You mean you are trying the latest acemd version or that you are rerunning the experiment with the old container? I would be interested if you are able to replicate the old results with the old container. Or send us the old inputs and old container and we can run it and see if we get the dissociation.

smar966 commented 1 month ago

I did a diff of the two codes. The only things which changed in Adaptive code was a fix for NPT simulations and maintaining their box size, but I assume your production runs are NVT, and that the MSM code changed from pyemma to deeptime.

Actually, I am (or I should) running NPT simulations. Could the changes have an impact on the ensemble I obtain? Was it buggy before? Among the many differences in the 'input' file, I can see that 'exclude scaled1-4' disappeared in the newer version. The same with the several "langevin" parameters to specify and regulate the Langevin thermostat! Could this be the issue?? You can compare the input files from the online folders that I mentioned in my previous comments.

Meanwhile, I am rerunning the adaptive sampling of one of the dimers I mentioned above using the 'acemd' to run MDs instead of 'acemd3'. But for the moment, I do not see any difference (no dissociation has happened yet)

You mean you are trying the latest acemd version or that you are rerunning the experiment with the old container? I would be interested if you are able to replicate the old results with the old container. Or send us the old inputs and old container and we can run it and see if we get the dissociation.

No, I meant that I tried to run the MDs with the new code using the 'acemd' command instead of the 'acemd3'. But I am also trying to replicate the old results with the old container, as you suggest. Unfortunately, I cannot share the old implementation with you since we have it installed as a "module," not as a singularity like the new one.

stefdoerr commented 1 month ago

No, both simulations are NVT since there is no barostat in either input files. Default barostat is off. The exclude scaled1-4 is ok it's not needed in the new version. The langevin parameters are just renamed in the new input file as thermostatxxx instead of langevinxxx and they are the same as in the old file.

langevin                        on
langevindamping                 0.1
langevintemp                    310
# new
thermostat              on
thermostatdamping       0.1
thermostattemperature   310
smar966 commented 1 month ago

Ohhhh I thought the barostat was turned on. My bad. Then the MD settings are not likely the problem. If the only thing that changed in the code was the MSM method in the adaptive sampling, I don't see how it could affect so much my results