Question: Can multiple repeats of simulation be aggregated to get better free energy estimates?

Gallicchio-Lab / AToM-OpenMM

OpenMM-based framework for absolute and relative binding free energy calculations with the Alchemical Transfer Method

Other

112 stars 31 forks source link

Question: Can multiple repeats of simulation be aggregated to get better free energy estimates? #94

Open hjuinj opened 3 weeks ago

hjuinj commented 3 weeks ago

Hi, if I launch multiple repeats from the same starting point for a simulation, does the AToM analysis procedure already support aggregating these runs together somehow to improve the free energy estimate?

Thank you.

egallicc commented 3 weeks ago

AToM-OpenMM does not do free energy analysis. The free energy analysis is done with external tools. In the examples, we provide an R script that uses UWHAM and a driver script called analyze.sh. The R script loads the r*/*.out output files. I think it can be easily modified to load data files from multiple simulations. Alternatively, try to concatenate the files using shell scripting. Either way, we recommend discarding the earlier portion of each replica trajectory to allow for equilibration.

hjuinj commented 3 weeks ago

Thanks @egallicc, sorry I did mean with the R wham analysis. It is good to know that this is possible.

So it seems this is not an approach you are trying in your lab. May I ask is there some other approach that you would recommend that could boost sampling given the same time budget but more compute?

egallicc commented 3 weeks ago

Running replicates is an excellent strategy, I think. It should boost convergence and yield good error estimates. We rarely find ourselves in your happy situation of a lot of hardware to deploy. It should take only a bit of scripting to automate it.

Additional replicas and running on multiple GPUs are other obvious alternatives to take advantage of the hardware.

hjuinj commented 3 weeks ago

Thanks for the swift response @egallicc. A follow up question: you mentioned above we ought to discard earlier portion of each replica to allow equilibration. If we run multiple repeats, can we just have one simulation that has the equilibration stage, and then spin multiple repeats from the same equilibrated point just to save some amount of compute?

If so, is there some easy setting in the cntl file that can enable us to that? I am think of setting a short WALL_TIME for a single repeat run, and resume the simulation with long, production length WALL_TIME and with multiple repeats from that checkpoint file. Would this work?

egallicc commented 3 weeks ago

I am not sure. Yours is a research question that would take some effort to test. Would the convergence rate per unit of compute cost be improved by running replicates starting from one starting point (however equilibrated) and collecting all their data, or would it be better to let replicates wander away from each other before collecting their data? The answer is probably system-dependent, and the optimal settings are likely somewhere between these extremes. The AToM-OpenMM settings in the control file are general enough to craft all kinds of replicate spawning mechanisms.