Becksteinlab / MDPOW

Calculation of water/solvent partition coefficients with Gromacs.
https://mdpow.readthedocs.io
GNU General Public License v3.0
25 stars 10 forks source link

Use OOP forcefields #267

Closed a-ws-m closed 1 year ago

a-ws-m commented 1 year ago

This implements several changes to make it easier for end-users to implement their own forcefields. Most of these changes revolve around the Forcefield class, which stores the path dictionaries that were previously separate entities. To use a non-default forcefield, make a new Forcefield instance and then pass it to Simulation using the ff_class argument. This argument is optional; passing a string value for forcefield will still work, it's simply converted behind the scenes. I also tidied up a few things, which may be worth discussing:

I'll write some example code that demonstrates how to use a Forcefield next week. In the meantime, I'm sharing this PR so that the changes can be discussed.

a-ws-m commented 1 year ago

@orbeckst here's the PR you asked me to ping you about!

codecov[bot] commented 1 year ago

Codecov Report

Merging #267 (3645776) into develop (f98424b) will increase coverage by 0.42%. The diff coverage is 93.54%.

@@             Coverage Diff             @@
##           develop     #267      +/-   ##
===========================================
+ Coverage    81.26%   81.68%   +0.42%     
===========================================
  Files           15       15              
  Lines         1975     2026      +51     
  Branches       297      310      +13     
===========================================
+ Hits          1605     1655      +50     
+ Misses         278      276       -2     
- Partials        92       95       +3     
Files Changed Coverage Δ
mdpow/equil.py 80.79% <88.23%> (+0.11%) :arrow_up:
mdpow/forcefields.py 92.56% <94.73%> (+4.75%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

a-ws-m commented 1 year ago

Done! I always forget to disable black when I'm working on new code, sorry about that.

orbeckst commented 1 year ago

You gave me the motivation to try out black on a project so I opened #271. Let's do PR #269 first and then come back here.

orbeckst commented 1 year ago

@a-ws-m now that PR #269 is merged, could you please rebase against current develop so that we hopefully end up with a nice, clean diff? Thanks!

a-ws-m commented 1 year ago

I've made a few important changes:

a-ws-m commented 1 year ago

Regarding the error in the FEP tests, I'm unable to reproduce it on my machine and I think it's an issue with the MD workflow rather than any Python code that I've changed here.

a-ws-m commented 1 year ago

Sorry for the continued changes, I'm realising as I try implementing the Martini example that there's more scope for improving usability of this.

I've now also implemented a change that removes the global DEFAULT_WATER_MODEL in favour of a Forcefield-specific approach. This solves #112 to boot.

orbeckst commented 1 year ago

Regarding the error in the FEP tests, I'm unable to reproduce it on my machine and I think it's an issue with the MD workflow rather than any Python code that I've changed here.

There's nothing you can do. It's a stochastic failure: if the simulation ends randomly reaches a point where the solute extends more than the nb cutoff, it triggers a known issue with how GROMACS calculates exclusions and modern versions of GROMACS stop the simulation (old versions just kept going with potentially questionable results).

Perhaps we can try an set a random seed that leads to a known working testrun... or we eventually replace it with a less iffy system.

a-ws-m commented 1 year ago

I've made those changes. The GitHub tests seem to be throwing some weird errors to do with cloning the repo.

orbeckst commented 1 year ago

I restarted CI and all jobs are running. It was a temporary problem.

a-ws-m commented 1 year ago

No problem! I'll work on those. I have actually been working on an example, it's in the martini-example branch of my fork. I have some questions before it's ready that I'll raise on the Q&A.

orbeckst commented 1 year ago

awesome!

a-ws-m commented 1 year ago

I've merged the example using Martini, you can find it in doc/examples/martini-example. I imagine there's scope for improving the run parameters to get a more accurate value, and perhaps fixing the issue whereby the nan error for the coulomb contribution propagates and gives a nan value for the overall error. I'm not sure how to improve these, though -- I'd be happy to hear suggestions!

I also need to go back and clear some of the old, erroneous output from that notebook. The plots, for example, are completely wrong and don't reflect the latest results.

orbeckst commented 1 year ago

Your MARTINI/MDPOW example notebook is fantastic!

(I also noted that some of the GROMACS runs actually didn't run but just restarted from the last frame. However, it's nice to know that it's not an issue re-running the workflow as it won't destroy data.)

For nan value in the Coulomb calculation I am not sure where it comes from. I assume that there is a valid Coulomb leg in the FEP calculation because MARTINI benzene has partial charges, doesn't it?

One thing that's a bit odd is that alchemlyb and mdpow report different values for statistical inefficiency, for example alchemlyb states 15.84 and mdpow states 7.9840.

mdpow.fep   : INFO     Performing statistical inefficiency analysis for window coulomb 0000
2023-09-04 22:41:16.809 | DEBUG    | alchemlyb.preprocessing.subsampling:statistical_inefficiency:520 - Running statistical inefficiency analysis.
2023-09-04 22:41:16.812 | DEBUG    | alchemlyb.preprocessing.subsampling:statistical_inefficiency:522 - Statistical inefficiency: 15.84.
2023-09-04 22:41:16.813 | DEBUG    | alchemlyb.preprocessing.subsampling:statistical_inefficiency:528 - Number of uncorrelated samples: 312.
mdpow.fep   : INFO     The statistical inefficiency value is 7.9840.
mdpow.fep   : INFO     The data are subsampled every 8 frames.

I think that's because we have to infer the actual value from what alchemlyb subsample in https://github.com/Becksteinlab/MDPOW/blob/6449879a9bc5fbc80210290eea84fd40ad164dff/mdpow/fep.py#L1192C20-L1192C20 and there's also a suspicious looking factor of 2 involved. (The whole question about getting the raw SI from alchemlyb was the topic of https://github.com/alchemistry/alchemlyb/issues/295 ). @VOD555 can you please check that the MDPOW output is consistent with the one from alchemlyb? I assume that other users that read the log output will have the same question as I

VOD555 commented 1 year ago

@orbeckst I've checked the code. The logger in the mdpow outputs the correlation time (g/2) but not the statistical inefficiency (g). And as the one calculated in mdpow is based on the length output of the subsampled dataframe, there is small difference between the mdpow one and alchemlyb one.

As currently alchemlyb has a logger output for the SI, I think we can remove the mdpow logger information.

VOD555 commented 1 year ago

Open an issue #275

a-ws-m commented 1 year ago

Thank you! The Gromacs output is for a restart because it was the quickest way to re-generate most of the useful information after clearing the cell outputs. I'll work on those other points today.

orbeckst commented 1 year ago

@a-ws-m I let you look at my one comment about "sedstate" in the MDP file but if you tell me that this is what it ought to be, I'll squash-merge. (If you want rewrite the history on the PR into less than 5 commits then you're also welcome to do so, e.g. code changes+tests, example, but if not then that's fine too.)

orbeckst commented 1 year ago

Thank you very much @a-ws-m , great contribution!!