Open laurenwinkler opened 9 months ago
Also adding the system XML files here. Amber-gaff.xml is with GAFF describing the ligand and amber/ff14SB.xml describing the protein Espaloma-lig-only.xml is with Espaloma describing the ligand and ambre/ff14SB.xml describing the protein Espaloma-full-system.xml is with both ligand and protein described using the EspalomaTemplateGenerator system-xml-files.zip
@laurenwinkler Hi, thank you for your interest in using Espaloma. I can reproduce your results using both RDkit and Openeye as cheminformatics backends.
As far as I can tell, Espaloma is behaving as expected, since the protein sdf has 2632 bonds, and that's exactly what we get in the system XML file for the full system (complex with Espaloma). Considering the extra 21 bonds from the ligand file, of course.
That makes me suspicious about the protein.sdf
file, how was this file created? If this was created from a PDB file, maybe it can be helpful if you share the PDB file from which the SDF was created Thanks!
@laurenwinkler Hi! Thank you for bringing this issue to our attention. It's something I haven't encountered before. Building on @ijpulidos's suggestion, perhaps creating a system using Amber/GAFF force fields with a PDB file for the protein may help debug the problem.
Sorry, never mind. I got confused.
Thank you for you response. I have attached the protein pdb file as well as the script used to convert the pdb to an SDF.
Would you mind sharing the script you used to produce the results as well as the XML file? [Uploading protein.zip…]()
@laurenwinkler I think the zip file wasn't successfully uploaded, can you retry uploading it? Thanks!
My apologies!
Just following up, could you share the script that you guys used to produce an XML file that had the correct bond terms for a complex with both protein and ligand described by Espaloma and no OE backend? @ijpulidos @kntkb
If you look at the XML files in Lauren's first comment (system-xml-files) the complex system (espaloma-full-system.xml) has 2653 bonds where as amber-gaff.xml and espaloma-lig-only.xml have 1323 bonds. If you have a way to start from the PDB file and have this work, our issue may be solved.
The extra bonds in espaloma-full-system.xml
seems to come from bonds that involve hydrogens. For example, there are bonds between particles 2 (heavy atom) and 3 (hydrogen atom) in espaloma-full-system.xml
but not in espaloma-lig-only.xml
and amber-gaff.xml
.
I also realized that espaloma-full-system.xml
is missing the <Constraints>
information for bonds that involve hydrogens. The bonds involved with hydrogens from the <Bonds>
section should be in <Constraints>
. I guess the EspalomaTemplateGenerator
doesn't have a way to work cooperatively with constraint arguments (e.g., app.HBonds
) passed to openmm.app.ForceField
.
Although this is not elegant, a workaround could involve writing a parser to transfer the <Bonds>
information to <Constraints>
after the XML file is created. Alternatively, one could update the solvated topology, generated by modeller.addSolvent
, by deleting the bonds involving hydrogens and then adding that information back as constraints to the system.
There is a sample script here that starts from a PDB file but requires OE in the backend.
@kntkb Great catch! Yes, I agree, this explains the issue. Maybe what we want is for the EspalomaTemplateGenerator
to have a way to respect the constraints, but we need a way to make it general for both proteins and small (non-protein) molecules. Thanks!
Thank you for your help. We redid the XML file generation for espaloma and amber and now they have the same bond count. I've attached the system XML files as well as our scripts and inputs. Would you be able to verify if the XML files look correct to you? Espaloma_system_test.zip
Not sure this is related, but in the hope it helps, it may be worth testing openmmforcefields 0.11.2
and 0.12.0
. We noticed differences in espaloma charges of symmetric atoms and @nbruciaferri observed that 0.12.0
fixed such asymmetries.
@diogomart Ah, yes. The asymmetric charges are expected when using openmmforcefields 0.11.2
because it uses the old espaloma model (0.2.2
). This issue was resolved in openmmforcefields 0.12.0
which now uses the improved espaloma model (0.3.x
) as default.
Thanks for the pointers @diogomart @kntkb
Thank you for your help. We redid the XML file generation for espaloma and amber and now they have the same bond count. I've attached the system XML files as well as our scripts and inputs. Would you be able to verify if the XML files look correct to you? Espaloma_system_test.zip
We would love to get to the bottom of if Espaloma is able to run with rdkit under the hood so we can benchmark. If you get the chance to check out these XML files, we would like to know if you still see an issue with them?
If you don't see any problems, this issue is resolved on our end.
Thank you for your help. We redid the XML file generation for espaloma and amber and now they have the same bond count. I've attached the system XML files as well as our scripts and inputs. Would you be able to verify if the XML files look correct to you? Espaloma_system_test.zip
If the intention was to parameterize the system using amber14SB
for the protein and espaloma
for the ligand, then the XML file appears to be correct. However, if the goal was to parameterize both the protein and ligand with espaloma
, then the XML file has not been created as intended.
The proteins are parameterized with amber
, not espaloma
, because you are specifying amber/protein.ff14SB.xml
in output_espaloma_complex.py
. If you want to parameterize with espaloma
, you will need to remove amber/protein.ff14SB.xml
. However, I think this might cause an error if the OpenEye Toolkits are not installed in the backend.
As mentioned earlier, I believe you could use RDKit as an alternative, but you would need to use SDF format for the protein and transfer the
Hi Espaloma Devs,
I am exploring the use of Espaloma (with RdKit under the hood) to describe a protein + ligand complex.
When comparing system XML files from an (unsolvated) Espaloma described system vs one described by amber/protein.ff14SB.xml and GAFF, I noticed that ligands described with Espaloma contain more than (ligand count) choose 2 additional number of bonded terms in the section of the output XML file. For example, if the ligand has let's say 12 atoms, the XML of a system where the ligand is parameterized with Espaloma has:
additionalnumber_terms > 3 x (12 choose 2)
If the ligand described with Espaloma is in a complex, is it intentional to add nonphysical harmonic bonds between the ligand and the protein?
I've attached the script used to build the Espaloma systems as well as the input files. I should also note that I am using: openff-interchange=0.3.18 openff-toolkit=0.14.3 openmm=8.0.0 espaloma=0.3.2 rdkit=2023.03.3 python=3.10
Any insight would be helpful. Thanks in advanced. Espaloma-test.zip