OpenFreeEnergy / openfe

The Open Free Energy toolkit
https://docs.openfree.energy
MIT License
135 stars 18 forks source link

Solvent padding description #827

Open mattwthompson opened 5 months ago

mattwthompson commented 5 months ago

Sorry for the long explanation - the actual issue here is incredibly pointed and I want to make sure I'm highlighting only it.

OpenFE's docstring of the padding option in solvation settings is:

https://github.com/OpenFreeEnergy/openfe/blob/695d03208bc429fc282e4332ce668b6911111889/openfe/protocols/openmm_utils/omm_settings.py#L53

This setting appears to be passed 1:1 to OpenMM without modification

OpenMM's documentation of a similar argument is a little more verbose

You can give a padding distance. A bounding sphere containing the solute is determined, and the box size is set to (sphere diameter)+(padding). This guarantees no atom in the solute will come closer than the padding distance to any atom of another periodic copy. If the sphere diameter is less than the padding distance, the box size is set to 2*(padding) to ensure no atom is closer than the padding distance to two periodic copies of any other atom.

For some small molecules, however, this produces results as if the bounding sphere representing the solute is size zero, which doesn't intuitively hold up to me. OpenFF has a similar implementation (and documented argument) that goes through Packmol after determining the expected system size, and for similar systems it frequently produces boxes a few Angstroms larger. I'm running small tests with a benzene-in-water system, and OpenFE's code path produces box vectors of 2.4 nm (simply twice the padding value of 1.2) whereas the OpenFF code path produces numbers closer to 2.8 nm (twice the padding value plus a little bit of fuzziness representing the size of the molecule). Maybe for a single atom it makes sense, but for a 3D molecule with some size to it, I don't think the box vectors should be twice the padding value in all three dimensions. Unfortunately this seems to be what OpenMM does (see below).

Both OpenFF and OpenFE code paths produce sensible results (no crashes) as far as I've observed, so I don't think it makes sense to ask for a behavior change. The documentation doesn't communicate that there should be different behavior, so I think that could be updated in some way. I have no reason to believe results would be different, but accidentally packing into a larger-than-needed-box is an easy way to slow down simulations with no benefit ((2.8 / 2.4) ** 3 ~= 1.6, or 60% slower compute!).

Here's a rough demonstration of how the OpenMM code path produces different system sizes with molecules of different size

import openmm
import openmm.app
import openmmforcefields
import openmm.unit

from openff.toolkit import Molecule

from openmmforcefields.generators import GAFFTemplateGenerator

def smiles_to_box_vectors(smiles: str):
    molecule = Molecule.from_smiles(smiles)

    molecule.generate_conformers(n_conformers=1)

    gaff = GAFFTemplateGenerator(molecules=molecule)

    forcefield = openmm.app.ForceField("tip3p.xml")
    forcefield.registerTemplateGenerator(gaff.generator)

    modeller = openmm.app.Modeller(
        molecule.to_topology().to_openmm(),
        molecule.conformers[0].to_openmm(),
    )

    modeller.addSolvent(
        forcefield,
        padding=1.0 * openmm.unit.nanometers,
    )

    a, b, c = modeller.getTopology().getPeriodicBoxVectors()

    return {round(value._value, 3) for value in [a[0], b[1], c[2]]}

for smiles in [
    "N",
    "CCO",
    "c1ccccc1",
    "C12C3C4C1C5C2C3C45",
    "CCCC[N+](CCCC)(CCCC)CCCC",
    20 * "C",
]:
    print(smiles, smiles_to_box_vectors(smiles))

# N {2.0}
# CCO {2.0}
# c1ccccc1 {2.0}
# C12C3C4C1C5C2C3C45 {2.0}
# CCCC[N+](CCCC)(CCCC)CCCC {2.226}
# CCCCCCCCCCCCCCCCCCCC {2.767}

If it would be useful I can

mikemhenry commented 5 months ago

Thanks for this!

If I understand this correctly, the first thing you are asking is for us to update the documentation. Do you have a suggested doc string? At first I was going to summarize the openmm doc and then include a link to it, but it sounds like even the openmm doc isn't great, so do you have a suggestion? Or, what would you like to see mentioned?

mattwthompson commented 5 months ago

Yeah sorry this is all over the place; it'd be much easier if it was an OpenFE or OpenMM documentation issue and not a mixture of both

The best suggestion I've thought of would be a dressed-up version of

"""
Minimum distance from any solute atoms to the solvent box edge.

This argument is passed directly to OpenMM's addSolvent (link), which
typically produces boxes of size twice the padding distance plus the approximate
solute size, but for some small solutes may treat the solute as having zero size.
"""

This would communicate the quirk I'm observing in a way that squares the responsibility with OpenMM but doesn't assign blame. I think paraphrasing the OpenMM docs risks this docstrings falling out of sync each other, whereas just a link wouldn't be as useful to the OpenFE user. (If the behavior changes in OpenMM, this is a different issue altogether). That covers everything I have been thinking of, the most important thing being that it describes what happens

IAlibay commented 5 months ago

Thanks @mattwthompson. We could/should add it to this PR: https://github.com/OpenFreeEnergy/openfe/pull/673, unfortunately it didn't make it into 1.0, but the ability to use the wider set of inputs from addSolvent alongside better docstring would be great.