Open slochower opened 7 months ago
@slochower Thank you for such a detailed and readable report. I can reproduce your issue. We haven't been actively using this analysis part of the code, that is true. We probably should improve it and take a look. Thanks for reporting!
On the other and more important question. I agree with the mixing of the replicas looking okay, but that matrix overlap does look suspicious. I don't really have an answer here. But I'll try to raise it in our dev meetings to see if we can spot something.
Other than that, I don't know how useful it would be for you to extract the trajectories. We have been planning on having an API for this on the openmmtools
side of things, but we are still working on that. For now, if you are willing to try, you could use this script to extract trajectories https://github.com/choderalab/perses-barnase-barstar-paper/blob/main/scripts/05_analyze/run_make_traj_per_replica.py you may need to adapt a few things to your needs. I hope that helps.
Thanks for the feedback! I haven't yet extracted the trajectories but I can report back on a little additional debugging with pymbar
.
If I look at the MBAR solvent matrix overlap with pymbar 3.1.1
then I see this (which I think looks alright):
If I use pymbar 4.0.3
I get the matrix shown in the first post.
If I force the robust solver, then I can recover the 3.1.1 behavior, which I think is correct. Since the packaging has changed, I'm doing it like this:
PYMBAR_4 = True if "__version__" in dir(pymbar) else False
# They changed packaging structure, so >= 4 uses `pymbar.__version__` and
# version 3.x uses `pymbar.version.version`
[...]
if PYMBAR_4:
analyzer = MultiStateSamplerAnalyzer(
reporter,
analysis_kwargs={"solver_protocol": "robust"},
)
else:
analyzer = MultiStateSamplerAnalyzer(reporter)
[...]
if PYMBAR_4:
overlap_matrix = _mbar.compute_overlap()["matrix"]
else:
overlap_matrix = _mbar.computeOverlap()["matrix"]
I also tried these steps when looking at the convergence (forcing the robust solver) and I don't see any significant difference in the dG values, so it's possible this MBAR quirk was a red herring. I'll keep investigating though and please let me know if you think there could be other places to debug these spuriously super stabilized edges.
This sounds like a pymbar issue, would it be worth opening an issue there and linking to this one to see if Michael Shirts might have any idea what's going on here? I know there were significant changes going from pymbar 3 to 4
Is there another method that's used for interrogating what's happening with an edge?
I would recommend using the yank analysis notebook to analyze edges: https://github.com/choderalab/yank/blob/master/Yank/reports/YANK_Health_Report_Example.ipynb
There are many ways to run this analysis notebook, see here: http://getyank.org/latest/analysis.html
Let me know if you need help getting this notebook to run / interpreting it!
I think its on the roadmap to eventually move this notebook to openmmtools but not sure if this is being prioritized at the moment
I think there is a bug with
Simulation.sample_history()
here: https://github.com/choderalab/perses/blob/f3aacde3588e5b90e67a21052d914a033d3c087b/perses/analysis/load_simulations.py#L235If you've only run a binding free energy calculation with complex and solvent phases, there is no vacuum trajectory, and therefore
self._vacdg_history
is an empty list. When callingself.historic_fes()
and thenself.sample_history()
, the latter function tries to indexself._vagdg_history[0]
which fails.Given a simulation directory with
out-complex.nc
andout-solvent.nc
, this is a MRE:I can submit a PR... however, I can't find any usages of this function in the repository. Is there another method that's used for interrogating what's happening with an edge?
The context is that I've been looking into some edges with huge (9+ kcal/mol) discrepancies. If I follow the approach in
simulation.historic_fes()
, both complex and solvent dG history looks relatively stable:However, if I look at the matrix overlap (using
analyzer.mbar.compute_overlap()
), the middle has fallen out.I'm not sure if I should believe this or if this is just a failure of MBAR. It's probably worth mentioning that I've run into the same
pymbar
>= 4 issue that @ijpulidos noted over there and I've downgraded to 3.1.1. But I still seeFailed to reach a solution to within tolerance with hybr: trying next method
on the solvent leg, which surprised me, because I assume that is easier to converge than the complex. Replica mixing on the solvent leg looks okay to me. So I'm stuck trying to debug what could be going wrong here.