GeometryEngine proposes wrong stereochemistry for CA in RBD:ACE2 mutations

zhang-ivy commented 3 years ago

@jchodera and I found in issue with the geometry proposals in WT RBD:ACE2 N501Y and beta variant RBD:ACE2 Y501N -- the CA chirality is R, not S. This happens in both apo and complex. Note that for these mutations, I do not map CB atoms and beyond.

This does not seem to be a problem in barnase:barstar (I do map CBs here) or capped ASN->TYR (I do not map CBs here).

@dominicrufa mentioned that the stereochemistry code (which starts here) maintains stereochemistry for atoms in unique old and new, but not for core atoms. This is a problem for amino acid mutations when we don't map CBs, because then CA chirality can be proposed with wrong stereochemistry.

@jchodera : If you see a quick fix for this, please let me know! I've attached scripts you can use to generate geometry proposals for capped ASN->TYR as well as RBD:ACE2 N501Y.

For the former, I've included a PDB of ASN that you'll need to load into the script.
For the latter, I've also included an apo htf you can load in to the script so that you can access the topology proposal and old positions.

debug_geometry_engine_stereo.zip

jchodera commented 3 years ago

@zhang-ivy : I can't seem to run the RBD:ACE2 example:

(perses-dev) lski2838:debug_geometry_engine_stereo choderaj$ python generate_geometry_proposals_rbd_ace2.py 
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Traceback (most recent call last):
  File "generate_geometry_proposals_rbd_ace2.py", line 14, in <module>
    htf = pickle.load(f)
ModuleNotFoundError: No module named 'openmm'

I'm guessing you might have a more recent development snapshot of OpenMM installed? Can you dump your conda env?

jchodera commented 3 years ago

@dominicrufa @zhang-ivy : I'm trying to understand the fundamental cause of this issue. Is it that the reference_topology.residue_oemol object only represents the sidechain, so that _determine_extra_torsions does not add the extra torsions needed to enforce chirality of the CA atom? If so, we should very much fix this rather than just add a hack around it. We have a number of options to consider:

Have residue_oemol include the backbone atoms. This would presumably require modifications to the residue mapping code, but I imagine this can also be simplified/streamlined at the same time.
Add CA chirality-preserving torsions as a special case in the _determine_extra_torsions method. It would be more desirable to use a general solution like (1), however.
Add a stereochemistry check and reject proposals that do not have the right stereochemistry, presuming that the correct stereochemistry will be obtained after a number of tries. This could add fragility to the process, however.
Is it possible that de-mapping the HA proton could help? Or would this just lead to random chirality?

(cc @ijpulidos, who is working on a better OpenFF way to handle biopolymer Topology objects so we can obtain "views" of residues as Molecule objects that can be converted into OEMol/RDMol.)

zhang-ivy commented 3 years ago

Is it that the reference_topology.residue_oemol object only represents the sidechain, so that _determine_extra_torsions does not add the extra torsions needed to enforce chirality of the CA atom?

No, the residue_oemol object represents the whole amino acid (backbone and sidechain).

I've just spent most of the day investigating this issue. This is the problem as I understand it: For N501Y, if we want to de-map CB atoms (and beyond), it seems that we need a chirality preserving torsion for the CA chiral center, otherwise CB will be placed with the wrong stereochemistry. I'm not quite sure why we need a chirality preserving torsion though, because if N, C, CA, and HA atoms are all core atoms, then there should only be one place where CB is allowed to be placed. But for some reason, CB is being placed in on top of the HA.

If we proceed with the assumption that we need a chirality preserving torsion for placing the CB, the problem is that a chirality preserving torsion is not being added because we do not add chirality preserving torsions when three of the chiral center's neighbors are core atoms (because there is only one place that the last atom can be placed) -- the neighbor check is here.

The potential solution here is to de-map HA, which would mean that the CA has only 2 neighbors that are core atoms, which would lead to the chirality preserving torsion being added. I tried this, but the CA stereochemistry still looks wrong. Here is the chirality preserving torsion that is being added:

# p1, p2, p3, p4, [periodicity, adjusted_phase, k, p4_target_growth_index]
[2611, 2607, 2609, 2608, (1.0, 1.064985300912619, 5020.8, 10.0)]

where [2611, 2607, 2609, 2608] corresponds to [CB, CA, C, HA].

@dominicrufa and I are not sure what's wrong with the above improper torsion. I originally thought that the problem was that we should be using N instead of HA to form the torsion, since N is a core atom and HA is no longer a core atom in this case, but I manually set the torsion to use particles [CB, CA, C, N] and this still results in CB being placed with the wrong stereochemistry at CA.

It seems like this problem is very specific to de-mapping CB in protein mutations. If we can't figure out an easy solution for this, maybe we should consider always mapping CB atoms? Especially since I haven't rigorously demonstrated that de-mapping CB is a helpful for reducing sampling challenges.

zhang-ivy commented 3 years ago

@jchodera @dominicrufa @ijpulidos : I am not able to reproduce the stereochemistry issue when I run the geometry engine on a capped N501 peptide (N501 + 3 residues before and after it, capped). It seems that the stereochemistry issue is occurring very specifically within the context of apo RBD and complex RBD:ACE2 systems... which is really puzzling since the geometry engine should only be using N501 valence terms in the proposal.

ijpulidos commented 3 years ago

I computed torsions for the different systems that are output of the scripts that @zhang-ivy shared (5 samples each):

capped aa: doesn't have the issue
small peptide: has the issue
mutated RBD: has the issue

And I obtained what's summarized in the following figures, which seem to suggest the very first torsion, which corresponds to indices (2611, 2607, 2609, 2610) is the key difference. Now, why is it calculating negative torsions instead of the expected positive? I don't know but I think this can pin point where the problem is occurring. I hope you can read the figure without too much problem.

torsion_problem_summary

zhang-ivy commented 3 years ago

@ijpulidos : Great work tracking down the problematic torsion! Can you figure out which atoms those indices correspond to? Knowing the atom names will help us brainstorm where the problem is coming from. If those are indices from the new topology, you should be able to print out the names of the atoms with something like this:

for atom in topology_proposal.new_topology.atoms():
    if atom.index in [2611, 2607, 2609, 2610]:
        print(atom.name)

The next step is to determine where the bug in the GeometryEngine is that is causing this torsion to be wrong. If you get stuck with this, let's find a time to chat and brainstorm together!

ijpulidos commented 3 years ago

@ijpulidos : Great work tracking down the problematic torsion! Can you figure out which atoms those indices correspond to?

@zhang-ivy Wouldn't that just be the same indices (plus 1, since it's 0-based in perses) in the output PDB file? So in that case they would be CB, CA, C and O. From the PDB:

...
ATOM   2608  CA  TYR 1 501     -33.803  14.943   3.463  1.00  0.00           C  
...
ATOM   2610  C   TYR 1 501     -32.394  14.409   3.821  1.00  0.00           C  
ATOM   2611  O   TYR 1 501     -32.263  13.393   4.504  1.00  0.00           O  
ATOM   2612  CB  TYR 1 501     -33.609  15.857   2.254  1.00  0.00           C  
...

Edit: Changed the order in the comment to reflect the same indices order.

zhang-ivy commented 3 years ago

Wouldn't that just be the same indices (plus 1, since it's 0-based in perses) in the output PDB file?

Yes its fine to do it this way (manually) too

zhang-ivy commented 3 years ago

@ijpulidos : Is this the error you get when you remove the neighbor check?

/scratch/lsftmp/2666993.tmpdir/ipykernel_34760/3935921749.py in _determine_extra_torsions(self, torsion_force, reference_topology, growth_indices)
    176 
    177                         #find p4:
--> 178                         p4_target_growth_index = min(tup[1] for tup in _nbr_to_growth_index_tuple if tup[1] > p1_target_growth_index)
    179                         p4 = [q[0] for q in _nbr_to_growth_index_tuple if q[1] == p4_target_growth_index][0] #take the first hit
    180                         _logger.debug(f"\t\t\t\t\tgrowth index carrying this improper: {p4_target_growth_index}")

ValueError: min() arg is an empty sequence

Note that when I print out _node_growth_index and _nbr_to_growth_index_tuple , I get:

_node_growth_index:  0
_nbr_to_growth_index_tuple [(53, 0), (55, 1), (58, 0), (51, 0)]

Therefore, I'm assuming what happens is p1_target_growth_index is set to 1, and then in line 178, it tries to call min([]) (because there are no more indices with _tup[1] > 1), which throws the above error.

zhang-ivy commented 3 years ago

If so, I think we'll need @dominicrufa 's help with understanding what's going on here. It's difficult for me to understand the rationale here

zhang-ivy commented 3 years ago

@ijpulidos and I spent some time debugging this together, and it seems like the reason why this geometry engine issue is happening in the N501 peptide but not in the capped amino acid is because of a difference in the starting geometries: in particular, the CA-N-C-O torsion.

Here I'm showing the capped amino acid (built manually using pymol) in pink vs the N501 peptide (taken directly from the RBD:ACE2 structure) in yellow. The CA-N-C-O angles in the capped amino acid vs the peptide are different:

When I rotate the N-C bond in the peptide to match the torsion angle in the capped amino acid, I no longer see a the stereochemistry issue.

The CA-N-C-O torsion angle must be strained in the peptide, which causes the potential energies here to differ from the potential energies in the capped amino acid (and peptide with the rotated N-C bond). Therefore, the probabilities of the torsion angles will also differ, which is why the stereochemistry arises.

Here's a pymol session with the relevant PDBs loaded: debug_stereochem.pse.zip

Moving forward, I will always map beta carbons in protein mutations, so this isn't blocking any science for me.

jchodera commented 3 years ago

Does this suggest we need to add the stereochemistry restraint then in order to correct these defective cases where the energetics of a slightly strained torsion do not lead to the correct stereochemistry?

We might also want to add an explicit stereochemistry check after the geometry build and reject improperly built geometries, or at least raise an Exception.

choderalab / perses

GeometryEngine proposes wrong stereochemistry for CA in RBD:ACE2 mutations #845