choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
180 stars 51 forks source link

Incorrect symmetry in mapping of TNSK2 ligand pair? #1078

Closed IAlibay closed 2 years ago

IAlibay commented 2 years ago

Hi,

I'm running through some of the new ligands from https://github.com/openforcefield/protein-ligand-benchmark/pull/52 and I noticed a rather odd mapping for at least one of the edges.

For the transformation of lig_5d to lig_5p of the TNSK2 set (edge.zip), it appears that the default mapping generated by perses.rjmc.atom_mapping.AtomMapperfor has the wrong symmetry for the central ring, and the tert-butanol?

Here's the 2D mapping: atom_mapping

And a 3D render of the mapping (7 A x direction translation in positions, matching sphere colors indicate a set of mapped atoms):

image

Here's a gist notebook with the code details: https://gist.github.com/IAlibay/089f614665ab3a10d076990ba8cd8024

Relevant library versions:

jchodera commented 2 years ago

@ialibay: Thanks for catching this! A few things could be going on here:

  1. We may not be enumerating all symmetry-equivalent valid atom mappings, so only the flipped middle ring mapping get enumerated
  2. We may not be using a small enough threshold for atom correspondence such that the flipped middle ring mapping is permitted as a geometrically valid mapping
  3. I can't recall if we are scoring by geometry or simply filtering by it. I seem to recall we are only filtering by it and then scoring by number of matched atoms. If so, we may need to score by RMS deviation of mapped atoms as well.

I think the key to debugging this will be to have the atom mapper print out all valid mappings and the scores for these mappings, and then see which of the above cases might be going on here. That will help us identify how to fix it.

If it can't easily be fixed, we may need to rethink the strategy of enumerating all valid mappings via graph matchings and then filtering/scoring them by geometry.

@IAlibay : This 3D visualization is beautiful! It would be fantastic if this ended up in a module that we could use to automatically generate 3D atom mapping visualizations one could explore interactively or non-interactively.

IAlibay commented 2 years ago

@jchodera

Apologies I don't have much time to hunt for bugs this week, but I'll try and see if I can have a deeper dive through the rjmc code next week.

Running

all_mappings = atom_mapper.get_all_mappings(molecule_1, molecule_2)

Returns a list with a single mapping, which as you mention would probably mean that all the symmetry-equivalent options are not being enumerated.

Turning off use_positions does lead to two mappings being identified, but for some reason they both have the same new_to_old mapping (and by extension the same score)?

re: visualization

We're hoping we'll move a version of this into openfe soon to do something similar. There might be an avenue to synch up to tooling in the future if we do move the atom mapping objects into gufe. That being said I think this might end up being a longer term thing (especially since I don't think we can move the visualisation tools into gufe).

In the short term, once I've added this into openfe I'm happy to add a version that's tailored to rjmc's AtomMapping if you're interested.

we could use to automatically generate 3D atom mapping visualizations one could explore interactively or non-interactively.

On that point though, 3Dmol.js isn't really geared towards saving images (unless there's been some new updates on that front), so you'd be mostly limited to interactive visualisation.

ijpulidos commented 2 years ago

As for the central ring I think the symmetry is okay. I cannot reproduce the atom mapping you posted in https://github.com/choderalab/perses/issues/1078#issue-1320869760 , I get the following image Which seems ok to me.

One of the issues that I can see is that the ligands are actually different from what the atom mapping shows. I'm very bad with chemistry naming so I'll just point the difference I see as follows image But as you can see in the mapping above, this is not shown as a difference. I tested with a simple cyclohexane to cyclohexene to confirm that this is not detected as a difference when doing the mapping. That is, bond orders are not taken into account, which may not be that important, but not sure. Just wanted to share this.

IAlibay commented 2 years ago

@ijpulidos can you post your starting SDF files and the code you used? (if different from what's in the gist)

Which seems ok to me.

Have you checked in 3d?

One of the issues that I can see is that the ligands are actually different from what the atom mapping shows.

Is that actually a difference? Isn't it a bond in an aromatic ring either way?

ijpulidos commented 2 years ago

@ijpulidos can you post your starting SDF files and the code you used? (if different from what's in the gist)

It is the same as in the gist. I just realized that you actually have the same atom mapping in the gist as mine, it's just different in the description of this issue.

jchodera commented 2 years ago

@IAlibay : Upon looking at this more closely, this doesn't look like a problem case: Even though perses maps atoms to the other side of the ring, the atoms are entirely chemically equivalent, meaning the parameters will not change, and this will not impact the free energy calculation.

We're worried this might be a problem if this happens where, say, the orange atom is a fluorine, and then perses decides to turn an F -> H and an H -> F on the other side of the molecule. Could you give this a try? We're happy to reopen this if you can find a failure case that would actually impact the calculation.

IAlibay commented 2 years ago

@jchodera whilst this doesn't impact the resultant hybrid force field, doesn't this cause an issue with the positions? If I remember correctly the perses code averages out the coordinates from both end states, so wouldn't you get a strangely squashed ring as your initial conformer? (granted it'll probably minimise out)

I'll investigate this further on my end, there's something else odd with this edge that causes simulation failures on our end irrespective of the mapping.