Open viktorbelay opened 1 year ago
Thanks for the very detailed bug report @viktorbelay It looks like the code path for using the mapper directly is slightly different from the path we take when using the CLI. I will investigate this.
@viktorbelay Thank you again for the report!
When I use AtomMapper directly on the SDF I'm using, I get similar warnings. However, it seems to be able to actually produce the mappings:
Can you post the code you used to do that? I can reproduce your error, but I can't reproduce your success.
from perses.rjmc.atom_mapping import AtomMapper
from openeye import oechem
atom_mapper = AtomMapper()
ifs = oechem.oemolistream()
ifs.open("ligand_aligned.sdf")
list_of_oemols = []
for mol in ifs.GetOEGraphMols():
list_of_oemols.append(oechem.OEGraphMol(mol))
current_molecule, proposed_molecule = list_of_oemols[0], list_of_oemols[1]
# Create an atom mapping for the default criteria
atom_mapping = atom_mapper.get_best_mapping(current_molecule, proposed_molecule)
Gives me the same error as running the whole pipeline LOGLEVEL=DEBUG perses-cli --yaml protein-ligand.yaml --platform CUDA
@mikemhenry Thank you for helping with this!
Strange, here is my code:
from openff.toolkit.topology import Molecule
from pkg_resources import resource_filename
from perses.rjmc.atom_mapping import AtomMapper
# Define path to SDF and load molecules
sdf_file='/data/chodera/viktor/IP3R_MD/perses_test2/ligand_aligned.sdf'
mol=Molecule.from_file(sdf_file, allow_undefined_stereo=True)
# Display molecules to verify they are loaded correctly
display(mol[0].visualize(backend='rdkit'))
display(mol[1].visualize(backend='rdkit'))
# Create a default AtomMapper
atom_mapper = AtomMapper()
# Do mapping from AMP to ATP
atom_mapping = atom_mapper.get_best_mapping(mol[1], mol[0])
display(atom_mapping)
# Do mapping from ATP to AMP
atom_mapping = atom_mapper.get_best_mapping(mol[1], mol[0])
display(atom_mapping)
I used the same environment to run this as the one I attached in my original post. I guess the difference is that you're using oechem
to load the molecules whereas I used OpenFF
. Could the issue be in oechem
?
I'm also attaching a Jupyter notebook where I displayed the mappings: atom_mapping copy.ipynb.zip
@viktorbelay I can reproduce your success as well now. What is weird is when I change this line mol=Molecule.from_file(sdf_file, allow_undefined_stereo=True)
to mol=Molecule.from_file(sdf_file, allow_undefined_stereo=False)
it still "works" but throws warnings about the stero chemistry being undefined.
I'm going to try and figure out why we are getting different results.
The issue boils down to this function: https://github.com/choderalab/perses/blob/main/perses/utils/openeye.py#L323 Where setting allow_undefined_stereo=True
actually causes us to not perceive the stereochemistry from the SDF, which is very different behavior than allowing undefined stereochemistry
What we want to do is have this code block always run but if allow_undefined_stereo=True
, we don't raise an error on these functions but instead continue. Currently this block doesn't run at all which is an issue since when reading in a SDF file, we always want to try oechem.OE3DToInternalStereo
# perceive chirality
if not allow_undefined_stereo:
assert oechem.OE3DToInternalStereo(molecule), f"the stereochemistry perception from 3D coordinates failed"
assert not has_undefined_stereocenters(molecule), f"there is an atom with an undefined stereochemistry"
@mikemhenry We discussed this yesterday on the devs meeting and I think I finally understand what you are trying to say, I think you are correct in suggesting that we should be executing the OE3DToInternalStereo
from openeye regardless if the stereochemistry is defined or not.
There are some subtleties that will probably deserve some more discussion, because as far as the discussion went, there are molecules that could have undefined stereochemistry even if the bond orders and 3D coordinates are known. We would need to think how to deal with these, or rather, how the tools we are using deal with these cases.
@ijpulidos I think during our refactor, we should try and see if we can use openff-toolkit
as the entry point for loading in small molecules, then call to_rdkit
or to_openeye
as needed, since then we can leverage all the work they have put in with stereochemistry validation and perception without us having to re-invent the wheel.
@ijpulidos I think during our refactor, we should try and see if we can use
openff-toolkit
as the entry point for loading in small molecules, then callto_rdkit
orto_openeye
as needed, since then we can leverage all the work they have put in with stereochemistry validation and perception without us having to re-invent the wheel.
Yeah, I still think we would like to have a workaround or semi-fix for this release, I'm just still not sure what's the best approach but maybe we just want to call OE3DToInternalStereo
when we read from SDF, avoiding changing more things elsewhere that may have unexpected consequences. For example, check the changes in https://github.com/choderalab/perses/commit/bdbc8b3fccc6f7a0e49c2641abbab1d0e4ff8f8f (this is a branch I was experimenting with for these changes, they currently work for @viktorbelay simulation).
Does that sound okay? We will just have to check that the unit tests and benchmarks are fine with that.
@ijpulidos Sure, that looks cleaner than what I did here https://github.com/choderalab/perses/pull/1167/files
I am trying to use the Perses CLI to calculate free energies for nucleotide binding to the IP3 receptor and am using ATP <-> AMP as a test run. I've been able to successfully run Perses for the Tyk2 example (https://github.com/choderalab/perses/tree/main/examples/protein-ligand/cli), but run into an error when trying it for my system:
When I use AtomMapper directly on the SDF I'm using, I get similar warnings. However, it seems to be able to actually produce the mappings:
I am wondering then if this could be a problem with parameterization. I have attached the full run log, my input and output files, and the Conda environment I am using. The command I'm running is:
LOGLEVEL=DEBUG perses-cli --yaml protein-ligand.yaml --platform CUDA
Attached files: inputs.zip outputs.zip log.zip environment.yml.zip