choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
179 stars 50 forks source link

IndexError with `_vacdg_history` if there is no vacuum trajectory #1247

Open slochower opened 2 months ago

slochower commented 2 months ago

I think there is a bug with Simulation.sample_history() here: https://github.com/choderalab/perses/blob/f3aacde3588e5b90e67a21052d914a033d3c087b/perses/analysis/load_simulations.py#L235

If you've only run a binding free energy calculation with complex and solvent phases, there is no vacuum trajectory, and therefore self._vacdg_history is an empty list. When calling self.historic_fes() and then self.sample_history(), the latter function tries to index self._vagdg_history[0] which fails.

Given a simulation directory with out-complex.nc and out-solvent.nc, this is a MRE:


from pathlib import Path

from perses.analysis import utils
from perses.analysis.load_simulations import Simulation

simulation = Simulation(Path.cwd())

simulation.historic_fes()
simulation.sample_history()

I can submit a PR... however, I can't find any usages of this function in the repository. Is there another method that's used for interrogating what's happening with an edge?

The context is that I've been looking into some edges with huge (9+ kcal/mol) discrepancies. If I follow the approach in simulation.historic_fes(), both complex and solvent dG history looks relatively stable:

Figure_2

However, if I look at the matrix overlap (using analyzer.mbar.compute_overlap()), the middle has fallen out.

Figure_1

I'm not sure if I should believe this or if this is just a failure of MBAR. It's probably worth mentioning that I've run into the same pymbar >= 4 issue that @ijpulidos noted over there and I've downgraded to 3.1.1. But I still see Failed to reach a solution to within tolerance with hybr: trying next method on the solvent leg, which surprised me, because I assume that is easier to converge than the complex. Replica mixing on the solvent leg looks okay to me. So I'm stuck trying to debug what could be going wrong here.

solvent_replicas

ijpulidos commented 2 months ago

@slochower Thank you for such a detailed and readable report. I can reproduce your issue. We haven't been actively using this analysis part of the code, that is true. We probably should improve it and take a look. Thanks for reporting!

On the other and more important question. I agree with the mixing of the replicas looking okay, but that matrix overlap does look suspicious. I don't really have an answer here. But I'll try to raise it in our dev meetings to see if we can spot something.

Other than that, I don't know how useful it would be for you to extract the trajectories. We have been planning on having an API for this on the openmmtools side of things, but we are still working on that. For now, if you are willing to try, you could use this script to extract trajectories https://github.com/choderalab/perses-barnase-barstar-paper/blob/main/scripts/05_analyze/run_make_traj_per_replica.py you may need to adapt a few things to your needs. I hope that helps.

slochower commented 2 months ago

Thanks for the feedback! I haven't yet extracted the trajectories but I can report back on a little additional debugging with pymbar.

If I look at the MBAR solvent matrix overlap with pymbar 3.1.1 then I see this (which I think looks alright):

MBAR-matrix-solvent-3 1 1

If I use pymbar 4.0.3 I get the matrix shown in the first post.

If I force the robust solver, then I can recover the 3.1.1 behavior, which I think is correct. Since the packaging has changed, I'm doing it like this:

PYMBAR_4 = True if "__version__" in dir(pymbar) else False
# They changed packaging structure, so >= 4 uses `pymbar.__version__` and
# version 3.x uses `pymbar.version.version`

[...]
        if PYMBAR_4:
            analyzer = MultiStateSamplerAnalyzer(
                reporter,
                analysis_kwargs={"solver_protocol": "robust"},
            )
        else:
            analyzer = MultiStateSamplerAnalyzer(reporter)
[...]
        if PYMBAR_4:
            overlap_matrix = _mbar.compute_overlap()["matrix"]
        else:
            overlap_matrix = _mbar.computeOverlap()["matrix"]

I also tried these steps when looking at the convergence (forcing the robust solver) and I don't see any significant difference in the dG values, so it's possible this MBAR quirk was a red herring. I'll keep investigating though and please let me know if you think there could be other places to debug these spuriously super stabilized edges.

zhang-ivy commented 1 month ago

This sounds like a pymbar issue, would it be worth opening an issue there and linking to this one to see if Michael Shirts might have any idea what's going on here? I know there were significant changes going from pymbar 3 to 4

Is there another method that's used for interrogating what's happening with an edge?

I would recommend using the yank analysis notebook to analyze edges: https://github.com/choderalab/yank/blob/master/Yank/reports/YANK_Health_Report_Example.ipynb

There are many ways to run this analysis notebook, see here: http://getyank.org/latest/analysis.html

Let me know if you need help getting this notebook to run / interpreting it!

I think its on the roadmap to eventually move this notebook to openmmtools but not sure if this is being prioritized at the moment