choderalab / yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.
http://getyank.org
MIT License
177 stars 70 forks source link

How restart Yank #1031

Closed nividic closed 6 years ago

nividic commented 6 years ago

Hi there, I'm trying to use the latest Yank version 0.22.3. Previously I was able to restart Yank (ver 0.20.1) by using the resume_simulation and resume_setup flags. I have attached a test system. I run the system for just 5 iterations (5 in on the selected checkpoint_interval), then I stopped the simulation and I want to restart for another 5 iterations (I increased the iterations number to 10 in the .yaml file). Unfortunately, if I try to rerun Yank will restart the simulation from iteration 1 and not from iteration 6 as it was happening for the previous Yank release (0.20.1). Can you please help me? Let me know if you need further info yank_0.22.3.zip

Lnaden commented 6 years ago

@nividic thats very odd. Iteration 5 is clearly written to the checkpoint, so the simulation should be able to resume from that point. Out of curiosity, what happens if you increase the iteration count to 6 (with checkpoint 5), does it resume correctly?

Also, can you run the short 5 iteration simulation, delete the log file, then start it up again, and provide the new log file which comes up after. I might have an idea as to what might be happening and I'll dig into it tonight, but it might take a bit. If you can get that log file in the mean time, it will help.

nividic commented 6 years ago

So after run Yank for just 5 iteration:

I edited the .yaml file to set the new yank iterations to 10, I disabled the minimization and I reset the resume flags to "yes" and finally I deleted the old .yaml file and I tried to restart. Unfortunately I get the following error:

2018-07-11 17:21:23,920 - DEBUG - yank.experiment - Cannot find ligand specification. Alchemically modifying the whole solute. 2018-07-11 17:21:23,920 - DEBUG - yank.experiment - DSL string for the solvent: "resname HOH" 2018-07-11 17:21:23,921 - DEBUG - yank.mpi - Single node: executing <function ExperimentBuilder._save_analysis_script at 0x10fd84488> 2018-07-11 17:21:23,923 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent1_checkpoint.nc 2018-07-11 17:21:23,961 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:21:23,962 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:21:23,966 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent1_checkpoint.nc 2018-07-11 17:21:23,991 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:21:23,992 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:21:23,996 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent2_checkpoint.nc 2018-07-11 17:21:24,014 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:21:24,014 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:21:24,019 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent2_checkpoint.nc 2018-07-11 17:21:24,037 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:21:24,038 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:25,603 - DEBUG - yank.experiment - Cannot find ligand specification. Alchemically modifying the whole solute. 2018-07-11 17:22:25,603 - DEBUG - yank.experiment - DSL string for the solvent: "resname HOH" 2018-07-11 17:22:25,604 - DEBUG - yank.mpi - Single node: executing <function ExperimentBuilder._save_analysis_script at 0x1a1a0c9488> 2018-07-11 17:22:25,605 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent1_checkpoint.nc 2018-07-11 17:22:25,635 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:25,636 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:25,639 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent1_checkpoint.nc 2018-07-11 17:22:25,664 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:25,664 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:25,669 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent2_checkpoint.nc 2018-07-11 17:22:25,686 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:25,687 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:25,689 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent2_checkpoint.nc 2018-07-11 17:22:25,706 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:25,707 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:43,458 - DEBUG - yank.experiment - Cannot find ligand specification. Alchemically modifying the whole solute. 2018-07-11 17:22:43,458 - DEBUG - yank.experiment - DSL string for the solvent: "resname HOH" 2018-07-11 17:22:43,459 - DEBUG - yank.mpi - Single node: executing <function ExperimentBuilder._save_analysis_script at 0x110c6f488> 2018-07-11 17:22:43,460 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent1_checkpoint.nc 2018-07-11 17:22:43,490 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:43,491 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:43,494 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent1_checkpoint.nc 2018-07-11 17:22:43,519 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:43,520 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:43,524 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent2_checkpoint.nc 2018-07-11 17:22:43,542 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:43,543 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12] 2018-07-11 17:22:43,545 - DEBUG - yank.multistate.multistatereporter - Initial checkpoint file automatically chosen as ./test/experiments/solvent2_checkpoint.nc 2018-07-11 17:22:43,563 - DEBUG - yank.multistate.multistatereporter - checkpoint_interval != on-file checkpoint interval! Using on file analysis interval of 5. 2018-07-11 17:22:43,564 - DEBUG - yank.multistate.multistatereporter - analysis_particle_indices != on-file analysis_particle_indices!Using on file analysis indices of [ 0 1 2 3 4 5 6 7 8 9 10 11 12]

I have attached the new simulation file:

yank_not_working.zip

Lnaden commented 6 years ago

@nividic I have some quick updates on this.

From the initial post with the yank_0.22.3.zip file, the YANK simulation is starting over from 1 because the output_dir: ./test in the YAML file is pointing at a directory other than the root where the YAML file is, so its not looking for the experiments dir in the same tier as the zip file. I originally thought there was an issue with changes I made to the resume code to be more robust, but after testing your file did not trip my debug lines, I was very confused for about 2 hours.

From the second file with yank_not_working.zip, I did find a bug in the resume code when the online_analysis_interval target is set (which it is by default now) and the number of iterations has been reached. I'm fixing that, I should have a resolution for this today along with another couple issues which cropped up in the last couple versions.