choderalab / espaloma

Extensible Surrogate Potential of Ab initio Learned and Optimized by Message-passing Algorithm 🍹https://arxiv.org/abs/2010.01196
https://docs.espaloma.org/en/latest/
MIT License
212 stars 23 forks source link

Parametrization of complex based on PDB and MOL file #211

Open entropybit opened 7 months ago

entropybit commented 7 months ago

Hi,

this might be a naive question and also somewhat of a function of my inexperience, but: I cant figure out how to load a ligand from a MOL file and than combine it with a PDB for the protein so that I get a complex which is parametrized using espaloma. As far as I understood the paper correct, the model can giver parameters for small molecules as well as proteins so this should be possible right ? Sorry for my possibly very uninformed question and thanks in advance for your answer.

To make this a bit more specific what I have tried is something like this:

from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout

#from openff.toolkit import Molecule
from openff.toolkit.topology import Molecule
#ligand = Molecule.from_smiles(''[H][c]1[n][c]([N]([H])[C](=[O])[C]2([H])[C]([H])([H])[C]2([H])[H])[c]([H])[c]([N]([H])[C](=[O])[c]2[c]([Cl])[c]([H])[c]([H])[c]([H])[c]2[Cl])[c]1[H])
ligand = Molecule.from_file("mol_0X5.mol")
print(ligand)
ligand_topology = ligand.to_topology().to_openmm()
ligand_positions = ligand.to_topology().get_positions().to_openmm()

## Create the SMIRNOFF template generator with the released espaloma-0.3.2 force field
from openmmforcefields.generators import EspalomaTemplateGenerator
espaloma = EspalomaTemplateGenerator(molecules=ligand, forcefield='espaloma-0.3.2')
## Create an OpenMM ForceField object with AMBER ff14SB and TIP3P with compatible ions

#from openmmforcefields.generators import GAFFTemplateGenerator
#gaff = GAFFTemplateGenerator(molecules=ligand)
#gaff.add_molecules(ligand)

from openmm.app import ForceField
forcefield = ForceField('amber/protein.ff14SB.xml', 'amber/tip3p_standard.xml', 'amber/tip3p_HFE_multivalent.xml')
# Register the SMIRNOFF template generator
forcefield.registerTemplateGenerator(espaloma.generator)
#forcefield.registerTemplateGenerator(gaff.generator)

pdb = PDBFile('4GIH_nowater_fixed.pdb')

modeller = Modeller(pdb.topology, pdb.positions)
#modeller.add(ligand_topology, ligand_positions)
#modeller.addSolvent(forcefield, padding=1.0 * nanometers, ionicStrength=0.15 * molar)

#forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME,
        nonbondedCutoff=1*nanometer, constraints=HBonds)
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.004*picoseconds)
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy()
simulation.reporters.append(PDBReporter('simulation_output.pdb', 1000))
simulation.reporters.append(StateDataReporter(stdout, 1000, step=True,
        potentialEnergy=True, temperature=True))
simulation.step(10000)

Based on the PDB 4GIH which I preprocessed with pdbfixer. With this code I get the following error

Molecule with name 'Clc1c(c(Cl)ccc1)C(=O)Nc1ccnc(c1)NC(=O)C1CC1' and SMILES '[H][c]1[n][c]([N]([H])[C](=[O])[C]2([H])[C]([H])([H])[C]2([H])[H])[c]([H])[c]([N]([H])[C](=[O])[c]2[c]([Cl])[c]([H])[c]([H])[c]([H])[c]2[Cl])[c]1[H]'
Did not recognize residue 0X5; did you forget to call .add_molecules() to add it?
Traceback (most recent call last):
  File "/lustre/projects2/csa/users/mayer21/openmm_tries/espaloma_protein_complex.py", line 38, in <module>
    system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME,
  File "/lustre/projects2/csa/users/mayer21/miniconda3/envs/openmm/lib/python3.9/site-packages/openmm/app/forcefield.py", line 1247, in createSystem
    templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
  File "/lustre/projects2/csa/users/mayer21/miniconda3/envs/openmm/lib/python3.9/site-packages/openmm/app/forcefield.py", line 1462, in _matchAllResiduesToTemplates
    raise ValueError('No template found for residue %d (%s).  %s  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template' % (res.index+1, res.name, _findMatchErrors(self, res)))
ValueError: No template found for residue 289 (0X5).  This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template

So far what ever I tried I get an No template found for residue error with either the small molecule or one of the amino acids. Help would be appreciated ^^

ijpulidos commented 6 months ago

Hello!

I believe this is very related to the issue raised in https://github.com/choderalab/espaloma/issues/209, basically you would need to reparameterize the protein with something similar to what the following script is using, https://github.com/choderalab/vanilla-espaloma-experiment/blob/main/experiment/script/create_system.py (please note that it requires OpenEye).

Something that uses RDkit instead of openeye should be possible to do, but we don't have any script or tool for that.

There have been a series of questions regarding this, we definitely need to make this easier for users, but so far this is what we can offer. Thanks for your interest and I hope this helps!