choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
179 stars 51 forks source link

Constrained C-H bond mapped onto unconstrained C-F bond #1112

Closed jchodera closed 1 year ago

jchodera commented 2 years ago

I just came across this mapping from ligand 0 to 4. The bond between atoms 10 and 22 (constrained C-H) is mapped onto an unconstrained C-F, which should not happen. image

Receptor PDB, ligands SDF, and template are attached. bug.zip

ijpulidos commented 1 year ago

@jchodera What should happen in this case?

jchodera commented 1 year ago

@ijpulidos : We should never map atoms in a constrained bond to atoms in an unconstrained bond. PWe used to explicitly de-map atoms in bonds that went from constrained -> unconstrained (or vice versa), or where the bond was constrained in start/end states but changed in constrained bond length. It's possible that somehow this final de-mapping step is being skipped when we are using maps derived from geometry.

Perhaps @dominicrufa could help us investigate this?

Marking this as high priority since it impacts all our current work on free energy calculations.

dominicrufa commented 1 year ago

@ijpulidos , if we are using this class for the topology proposal, then regardless of whether we are using a pre-determined geometry-based atom map or a openeye-enabled mcss method, the _constraint_repairs method here should be called afterwards to de-map any atoms that have changing constraints. there is also a debugging logger that prints out the adjusted atom map before/after the constraint repairs, so if the constraint_repairs method is buggy, one should be able to spot it with that logger.

ijpulidos commented 1 year ago

@dominicrufa I checked and the SmallMoleculeSetProposalEngine._constraint_repairs method is being called, but the length of the adjusted_atom_map is the same after running it. From what I can see, that method is only marking things to demap if both atoms, in the old and new systems, are hydrogens here. But this is not the case here, since we have a fluorine in the new system. Does this mean that the method is buggy?

dominicrufa commented 1 year ago

yep, that's the problem. it should be agnostic to the annotations on the new/old atom. and as you can see here https://github.com/choderalab/perses/blob/2ff4a7bb631a93716b64f2bd11dc68a2864402cd/perses/rjmc/topology_proposal.py#L2348, it was a TODO to treat it more generally.

On Mon, Oct 17, 2022 at 12:18 PM Iván Pulido @.***> wrote:

@dominicrufa https://github.com/dominicrufa I checked and the SmallMoleculeSetProposalEngine._constraint_repairs method is being called, but the length of the adjusted_atom_map is the same after running it. From what I can see, that method is only marking things to demap if both atoms, in the old and new systems, are hydrogens here https://github.com/choderalab/perses/blob/2ff4a7bb631a93716b64f2bd11dc68a2864402cd/perses/rjmc/topology_proposal.py#L2372. But this is not the case here, since we have a fluorine in the new system. Does this mean that the method is buggy?

— Reply to this email directly, view it on GitHub https://github.com/choderalab/perses/issues/1112#issuecomment-1281126646, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALGE3XXL3G3NINSWZ6AVPC3WDV35NANCNFSM6AAAAAAQ3Z4ZU4 . You are receiving this because you were mentioned.Message ID: @.***>

jchodera commented 1 year ago

Here's an example where there is an H -> F mapping (atom 38): atom_mapping For the solvent phase, for the same mapping, we have:

As a control, here is a related transformation where we are adding an F, but not changing X-H to X-F: image

The constrained -> unconstrained change shouldn't really change the free energy of individual legs, so the fact that we see a change from 2 -> 50 kT for a single leg is worrying. Practically, the error may be on the ~1 kT level, but this still appears to be urgent to address.

jchodera commented 1 year ago

@dominicrufa @ijpulidos : What do you think about a general solution like this?

Is this overkill for this problem? Do we just need to add more comparison cases to the current code instead?

dominicrufa commented 1 year ago

if there is a constraint that is mapped, then i am pretty certain the hybrid system will retain the old constraint.

Create a network.Graph from the mapping, creating edges between atoms that are bonded in both old and new topologies/systems. Annotate the edges with the old and new bond lengths, and whether they are constrained or not Remove any edges that are constrained in one state but unconstrained in the other state Remove any edges that are constrained in both states but differ in bond lengths Rebuild the mapping from the largest connected component

I am also pretty sure that ^this procedure was basically the idea I had in mind when i first wrote the constraint de-mapper; of course, it is probably the most robust way to handle this constraint mapping problem.

Is this overkill for this problem? Do we just need to add more comparison cases to the current code instead?

if this is a recurring problem associated with deterministic geometry mapping, then it might not be overkill and might be worth the allocation of effort. alternatively, one could reduce the deterministic atom overlap distance tolerance to less than the difference between the CH and FH distance so that the H is never mapped to F (perhaps the easiest way to fix this transformation). when the constraint fixes were initially implemented, we used the mcss procedure, which typically forbids this flavor of mapping in the first place, which is why I didn't bother to write a constraint test for this.

jchodera commented 1 year ago

if there is a constraint that is mapped, then i am pretty certain the hybrid system will retain the old constraint.

That may explain the behavior we're seeing: If there's a constraint and a harmonic bond term that interpolates, this would contribute a large amount to the free energy for each leg that would mostly (but not fully) cancel.

if this is a recurring problem associated with deterministic geometry mapping, then it might not be overkill and might be worth the allocation of effort. alternatively, one could reduce the deterministic atom overlap distance tolerance to less than the difference between the CH and FH distance so that the H is never mapped to F (perhaps the easiest way to fix this transformation).

This will likely be very brittle, since we would have to thread the needle between C-H and C-F bond differences (~0.25A) while being lax enough to tolerate small deviations due to how Omega or other constrained mapping schemes generate core atom positions.

From gaff.dat (GAFF 2.11):

c2-f   370.6    1.3385      SOURCE1_SOURCE5      35      0.0085
c2-ha  343.1    1.0879      SOURCE3_SOURCE5    5991      0.0019

Graphically, we don't want these atoms to be mapped: image but we do want these atoms to be mapped: image It might be a quick fix for now---a tradeoff between going back to 4 fs timesteps but transforming many more molecules that aren't mapped as desired.

when the constraint fixes were initially implemented, we used the mcss procedure, which typically forbids this flavor of mapping in the first place, which is why I didn't bother to write a constraint test for this.

That's a good point that we should first start by implementing some tests for some simple cases, like pyridine to 2-fluoropyridine.

dominicrufa commented 1 year ago

This will likely be very brittle, since we would have to thread the needle between C-H and C-F bond differences (~0.25A) while being lax enough to tolerate small deviations due to how Omega or other constrained mapping schemes generate core atom positions.

ah, i thought that the mapped atoms were perfectly overlaid; yes, that would definitely be threading a needle.

regarding the networkx-fix way, it turns out i already wrote networkx representations of the transforming residues: https://github.com/choderalab/perses/blob/2ff4a7bb631a93716b64f2bd11dc68a2864402cd/perses/rjmc/topology_proposal.py#L76, so there is already a good starting point for the more robust fix.

ijpulidos commented 1 year ago
* Remove any edges that are constrained in both states but differ in bond lengths

This is getting done in the current state of things.

* Remove any edges that are constrained in one state but unconstrained in the other state

This should solve the current transformation and it's probably the quickest fix. I already tried adding this to our SmallMoleculeSetProposalEngine._constraint_repairs() after line 2375, adding something like the following:

# demap atoms that were constrained and now are in an unconstrained edge
elif old_index in old_constraints.keys() and new_index not in new_constraints.keys():
    to_delete.append(new_index)

Which results in adjusted_map length differing by one after applying the constraint repairing method, as expected. I also checked that the atom getting demapped is the Fluorine atom.

The only remaining issue with this approach, is that the atom_mapping.png gets generated independently of this constraint treatment/repair. Which makes it unreliable for checking the actual mapping being used in the simulation. We would need to come with a better way of rendering the actual mapping that ends up being used.

Does this solution seem good enough? @jchodera can you share the code of what you did in https://github.com/choderalab/perses/issues/1112#issuecomment-1281385830 to check this, if that's a proper way to test this is working as expected.

jchodera commented 1 year ago

@jchodera can you share the code of what you did in https://github.com/choderalab/perses/issues/1112#issuecomment-1281385830 to check this, if that's a proper way to test this is working as expected.

I simply ran the solution phase with constraints in the YAML:

timestep: 4 # femtoseconds
#remove_constraints: "not water"
n_steps_per_move_application: 250

and without constraints

timestep: 2 # femtoseconds
remove_constraints: "not water"
n_steps_per_move_application: 500

You can probably do the same thing if it's possible to run just solvent and vacuum for something like CH3OH -> CH2FOH.

jchodera commented 1 year ago

The only remaining issue with this approach, is that the atom_mapping.png gets generated independently of this constraint treatment/repair. Which makes it unreliable for checking the actual mapping being used in the simulation. We would need to come with a better way of rendering the actual mapping that ends up being used.

I'm not sure I understand why this is. The image is rendered by calling the .render_image() method of the AtomMapping object. Repairing the atom mapping is done by this static method SmallMoleculeSetProposalEngine._constraint_repairs(), which (as the comment at the top indicates) should be moved to AtomMapping or AtomMapper. Presumably, the reason for this is that SmallMoleculeSetProposalEngine has access to the System object (which contains constraint and bond length information) while AtomMapper does not.

The bigger issue is just when we render the image of the atom mapping. I can't tell where we call it in production setup code. If we just call it at the end of the proposal, after the constraint repairs, we should be fine.

We should probably raise the question of where "constraint repairs" that require knowledge of the parameterized system should actually occur in the atom mapping flow, since I believe this breaks the GUFE API assumptions.