choderalab / yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.
http://getyank.org
MIT License
181 stars 71 forks source link

YANK Analyze duration #1144

Open Teohoho opened 5 years ago

Teohoho commented 5 years ago

I've been using YANK (0.23.4) to simulate a protein-ligand system (6069 atoms and water box and around 100 replicas for both the complex and solvent phase), on a system with two GTX 1080Ti GPUs, and so far it's going great! However, when I want to analyze the simulation, I notice it takes much much more time than it takes to simulate the system itself! For example, for 1200 iterations, the YANK analyze log is as follows: " 2019-02-06 10:37:43,040: Reading energies... 2019-02-06 10:37:43,179: Done. 2019-02-06 10:37:43,183: Assembling effective timeseries... 2019-02-06 10:37:43,207: Done. 2019-02-06 10:37:43,211: Reading energies... 2019-02-06 10:37:43,352: Done. 2019-02-06 10:37:43,356: Assembling effective timeseries... 2019-02-06 10:37:43,380: Done. 2019-02-06 10:37:43,396: Reading energies... 2019-02-06 10:37:43,467: Done. 2019-02-06 10:37:43,470: Assembling effective timeseries... 2019-02-06 10:37:43,490: Done. 2019-02-06 10:37:43,491: Reading energies... 2019-02-06 10:37:43,559: Done. 2019-02-06 10:37:43,561: Assembling effective timeseries... 2019-02-06 10:37:43,581: Done. 2019-02-06 10:37:46,057: Reading energies... 2019-02-06 10:37:46,214: Done. 2019-02-06 10:37:46,218: Assembling effective timeseries... 2019-02-06 10:37:46,260: Done. 2019-02-06 10:37:46,288: Checking if we need to unbias the restraint... 2019-02-06 10:37:46,288: Trying to get radially symmetric restraint data... 2019-02-06 10:37:46,288: Retrieving end thermodynamic states... 2019-02-06 10:37:54,979: Isolating restraint force... 2019-02-06 10:37:56,414: Deep copying restraint force... 2019-02-06 10:37:56,416: Retrieving particle masses... 2019-02-06 10:37:56,416: Done. 2019-02-06 10:37:56,420: Found HarmonicRestraintForce restraint. The restraint will be unbiased. 2019-02-06 10:37:56,420: Computing restraint energies... 2019-02-10 12:55:54,293: Restraint energy mean: 1.0667663520787305 kT; std: 0.7953572883898831 kT 2019-02-10 12:55:54,294: Reading energies... 2019-02-10 12:55:54,563: Done. 2019-02-10 12:55:54,569: Assembling effective timeseries... 2019-02-10 12:55:54,623: Done. 2019-02-10 12:55:54,652: Assembling uncorrelated energies... 2019-02-10 12:55:54,955: Found expanded cutoff states in the energies! 2019-02-10 12:55:54,955: Free energies will be reported relative to them instead! 2019-02-10 12:55:54,956: Done. 2019-02-10 12:55:57,074: Done. 2019-02-10 12:55:57,080: Computing free energy differences... 2019-02-10 12:56:13,299: Done. 2019-02-10 12:56:13,300: Computing covariance matrix... (Followed by the Deltaf_ij matrices for the solvent and complex phases) " As you can see, computing the restraint energies took around 4 days. I noticed that the analysis is done on one CPU. Can I somehow run this analysis on GPU? Thanks!

jchodera commented 5 years ago

We've noticed this as well---there seems to be a problem with the unbiasing of restraints in the Computing restraint energies... step. @andrrizzi and I have been working on a fix, but for now, we suggest you start with the restraint off (lambda_restraints starts with 0 in the fully-interacting states when other lambda values are 1) and turn the restraint on (to 1) as the other lambdas are turned to 0. This will mean that unbiasing will not be used.

You can also use the --skipunbiasing command-like argument during analysis, but this may not correctly account for the effects of the restraint.

andrrizzi commented 5 years ago

Hi @Teohoho . That step is done on the CPU. You can switch to GPU it by changing this line to use "CUDA" or "OpenCL" instead, but you may end up with an even slower calculation in this case. The reason is that we recompute the restraint energies/distances using a "slice" of the system consisting of only the restrained atoms and the restraint force. The GPU shouldn't add much advantage in computing the energy (which is just a single harmonic term), but it should add some overhead when sending and receiving data. It's still worth the shot if you want to benchmark it. If you do, it would be very helpful if you let us know how it went!

As a workaround, as suggested above, you can start your simulation using a protocol with lambda_restraints starting at 0. Or you can use a flat-bottom restraint and use --skipunbiasing as the bias introduced by an active flat-bottom restraint in the bound state is usually negligible if the radius is large enough.

Teohoho commented 5 years ago

Thank you @jchodera. I was thinking of using "--skipunbiasing", but I'm afraid that that will give an incorrect value for the Absolute Binding Energy. Also, when I run a simulation with "Online Analysis" turned on, I noticed YANK calculates the Absolute Binding energy using MBAR. Is there a way I could get it to print out this value, so as to skip "analysis" entirely? Also also, since I've already started this issue/thread, I'd like to tell you about some of the issues I've had with running Yank for a few months now. None of these are imperative, but I think they are important for the 1.0 release:

@andrrizzi I will run an analysis on a 100 iteration simulation of my system and will communicate the results. Do you think maybe lowering the number of restrained atoms (in the receptor part) will speed up the analysis?

EDIT: I have done the comparison benchmark. The results are:

It is a 15% increase in speed when using CUDA. For this particular analysis it might not be that noticeable, but I wonder for an analysis that took 4 days using CPU, how large a difference will the CUDA Analysis make.

HershGupta404 commented 5 years ago

I think it might be an issue with the speed of MDTraj's implementation of image.

andrrizzi commented 5 years ago

Thanks for the feedback! Those are known problems and we're planning to fix them eventually.

Is there a way I could get it to print out this value, so as to skip "analysis" entirely?

You can use python to instantiate a Reporter and read the online analysis using this method: https://github.com/choderalab/yank/blob/master/Yank/multistate/multistatereporter.py#L1194 . Keep in mind that those values do not use the unbiasing so it'll be equivalent to using --skipunbiasing. You'll also have to sum complex and solvent free energy yourself and add the appropriate standard state correction to recover the value given by the analysis.

Do you think maybe lowering the number of restrained atoms (in the receptor part) will speed up the analysis?

As @404random said, this might help with the imaging of the trajectory, which is the slowest part of the analysis. It may also boost a little the computation of the centroids used to determine the harmonic restraint radius.

Teohoho commented 5 years ago

@andrrizzi I see. I will run a simulation with the same system but will reduce the number of restrained atoms to a minimum (3-4 atoms, as opposed to the 30-something I have now) and will get back to you on the difference in analysis time.

andrrizzi commented 5 years ago

Sounds good. Consider also following the suggestion from @jchodera and start your simulations with lambda_restraints: [0.0, ...]. That removes the need for unbiasing completely and the analysis should be much faster. If your ligand is not a weak binder, and you're not worried that it may dissociate during the simulated timescales, the calculation will be correct.

Teohoho commented 5 years ago

Ok, so should the "lambda_restraints" parameter start at 0, then go up to 1 before the other "lambda" become less than 1, or should I start to increase the restraint parameter only when one of the other "lambda" values is equal to 0?

Also, it is important for my experiment that the ligand stay between to alpha-helixes, so it drifting away might be an issue.

andrrizzi commented 5 years ago

We don't have a clear answer to that question yet. Somewhat speculating, as long as the lambda_restraints is 1.0 when you start turning off lambda_sterics from 1.0 to 0.0, it should be fine.

Also, it is important for my experiment that the ligand stay between to alpha-helixes, so it drifting away might be an issue.

I see. you could run a short calculation, check the trajectory of the first state to see there's too much movement and extend them if it is acceptable.

Teohoho commented 5 years ago

Thank you @andrrizzi, you have been most helpful!

Teohoho commented 5 years ago

Hello again! I am having an issue with running the analysis on one of my simulations. It is tangentially related to this issue, but if you consider it not to be, I will gladly open another issue.

When I try to analyze the output of one of my simulations, this is my output:

"2019-02-15 10:54:44,677: Reading energies... 2019-02-15 10:54:44,760: Done. 2019-02-15 10:54:44,764: Assembling effective timeseries... 2019-02-15 10:54:44,795: Done. 2019-02-15 10:54:44,804: Reading energies... 2019-02-15 10:54:44,888: Done. 2019-02-15 10:54:44,895: Assembling effective timeseries... 2019-02-15 10:54:44,927: Done. 2019-02-15 10:54:44,958: Reading energies... 2019-02-15 10:54:45,009: Done. 2019-02-15 10:54:45,012: Assembling effective timeseries... 2019-02-15 10:54:45,046: Done. 2019-02-15 10:54:45,050: Reading energies... 2019-02-15 10:54:45,108: Done. 2019-02-15 10:54:45,110: Assembling effective timeseries... 2019-02-15 10:54:45,133: Done. 2019-02-15 10:54:47,355: Reading energies... 2019-02-15 10:54:47,511: Done. 2019-02-15 10:54:47,515: Assembling effective timeseries... 2019-02-15 10:54:47,540: Done. 2019-02-15 10:54:47,566: Checking if we need to unbias the restraint... 2019-02-15 10:54:47,566: Trying to get radially symmetric restraint data... 2019-02-15 10:54:47,566: Retrieving end thermodynamic states... 2019-02-15 10:54:52,185: Isolating restraint force... 2019-02-15 10:54:53,357: Deep copying restraint force... 2019-02-15 10:54:53,358: Retrieving particle masses... 2019-02-15 10:54:53,358: Done. 2019-02-15 10:54:53,362: Found HarmonicRestraintForce restraint. The restraint will be unbiased. 2019-02-15 10:54:53,362: Computing restraint energies... HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 140098480371456:

000: H5Dio.c line 171 in H5Dread(): can't read data

major: Dataset
minor: Read failed

001: H5Dio.c line 544 in H5D__read(): can't read data

major: Dataset
minor: Read failed

002: H5Dchunk.c line 2050 in H5D__chunk_read(): unable to read raw data chunk

major: Low-level I/O
minor: Read failed

003: H5Dchunk.c line 3405 in H5D__chunk_lock(): data pipeline read failed

major: Data filters
minor: Filter operation failed

004: H5Z.c line 1374 in H5Z_pipeline(): filter returned failure during read

major: Data filters
minor: Read failed

005: H5Zdeflate.c line 123 in H5Z_filter_deflate(): inflate() failed

major: Data filters
minor: Unable to initialize object

Traceback (most recent call last): File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 404, in get value = instance._cache[self.name] KeyError: 'mbar'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 404, in get value = instance._cache[self.name] KeyError: 'unbiased_decorrelated_u_ln'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/teo/miniconda3/envs/env_python3.6/bin/yank", line 11, in load_entry_point('yank==0.23.2', 'console_scripts', 'yank')() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/cli.py", line 73, in main dispatched = getattr(commands, command).dispatch(command_args) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/commands/analyze.py", line 152, in dispatch single_run() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/mpi.py", line 263, in _wrapper return run_single_node(rank, task, *args, kwargs) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/mpi.py", line 212, in run_single_node result = task(args, kwargs) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/commands/analyze.py", line 147, in single_run output = analyze.analyze_directory(args['--store'], analyzer_kwargs) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/analyze.py", line 768, in analyze_directory analysis_data = auto_experiment_analyzer.auto_analyze() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/analyze.py", line 172, in make_copy return copy.deepcopy(wrap(args, kwargs)) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/analyze.py", line 681, in autoanalyze = self.get_experiment_free_energy_data() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/analyze.py", line 172, in make_copy return copy.deepcopy(wrap(*args, **kwargs)) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/analyze.py", line 541, in get_experiment_free_energy_data data[phase_name] = analyzer.analyze_phase() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/analyze.py", line 150, in analyze_phase Deltaf_ij, dDeltaf_ij = self.get_free_energy() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 1894, in get_free_energy self._compute_free_energy() File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 1847, in _compute_free_energy nstates = self.mbar.N_k.size File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 406, in get value = self._get_default(instance) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 427, in _get_default value = self._default(self, instance) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site- packages/yank/multistate/multistateanalyzer.py", line 2069, in mbar return instance._create_mbar(instance._unbiased_decorrelated_u_ln, File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 406, in get value = self._get_default(instance) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 427, in _get_default value = self._default(self, instance) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 2049, in _unbiased_decorrelated_u_ln return instance._compute_mbar_unbiased_energies()[0] File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 1532, in _compute_mbar_unbiased_energies weights_group2) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistateanalyzer.py", line 1715, in _compute_restraint_energies analysis_particles_only=True) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistatereporter.py", line 623, in read_sampler_states obey_checkpoint_interval=False) File "/home/teo/miniconda3/envs/env_python3.6/lib/python3.6/site-packages/yank/multistate/multistatereporter.py", line 1648, in _read_sampler_states_from_given_file x = storage.variables['positions'][read_iteration, replica_index, :, :].astype(np.float64) File "netCDF4/_netCDF4.pyx", line 3961, in netCDF4._netCDF4.Variable.getitem File "netCDF4/_netCDF4.pyx", line 4798, in netCDF4._netCDF4.Variable._get File "netCDF4/_netCDF4.pyx", line 1638, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: HDF error"

I must mention that this simulation has been stopped and restarted around 4 times, and the yaml script modified ( I wanted fewer iterations than initially planned). What can I do to solve this issue and run my analysis on the simulation? Thanks!

UPDATE: If I run the same command with the "--skipunbiasing" flag, it runs successfully!