Open zhang-ivy opened 3 years ago
@zhang-ivy : I can't seem to run the RBD:ACE2 example:
(perses-dev) lski2838:debug_geometry_engine_stereo choderaj$ python generate_geometry_proposals_rbd_ace2.py
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Traceback (most recent call last):
File "generate_geometry_proposals_rbd_ace2.py", line 14, in <module>
htf = pickle.load(f)
ModuleNotFoundError: No module named 'openmm'
I'm guessing you might have a more recent development snapshot of OpenMM installed? Can you dump your conda env?
@dominicrufa @zhang-ivy : I'm trying to understand the fundamental cause of this issue. Is it that the reference_topology.residue_oemol
object only represents the sidechain, so that _determine_extra_torsions
does not add the extra torsions needed to enforce chirality of the CA
atom? If so, we should very much fix this rather than just add a hack around it. We have a number of options to consider:
residue_oemol
include the backbone atoms. This would presumably require modifications to the residue mapping code, but I imagine this can also be simplified/streamlined at the same time.CA
chirality-preserving torsions as a special case in the _determine_extra_torsions
method. It would be more desirable to use a general solution like (1), however.HA
proton could help? Or would this just lead to random chirality?(cc @ijpulidos, who is working on a better OpenFF way to handle biopolymer Topology
objects so we can obtain "views" of residues as Molecule
objects that can be converted into OEMol/RDMol
.)
Is it that the reference_topology.residue_oemol object only represents the sidechain, so that _determine_extra_torsions does not add the extra torsions needed to enforce chirality of the CA atom?
No, the residue_oemol
object represents the whole amino acid (backbone and sidechain).
I've just spent most of the day investigating this issue. This is the problem as I understand it: For N501Y, if we want to de-map CB atoms (and beyond), it seems that we need a chirality preserving torsion for the CA chiral center, otherwise CB will be placed with the wrong stereochemistry. I'm not quite sure why we need a chirality preserving torsion though, because if N, C, CA, and HA atoms are all core atoms, then there should only be one place where CB is allowed to be placed. But for some reason, CB is being placed in on top of the HA.
If we proceed with the assumption that we need a chirality preserving torsion for placing the CB, the problem is that a chirality preserving torsion is not being added because we do not add chirality preserving torsions when three of the chiral center's neighbors are core atoms (because there is only one place that the last atom can be placed) -- the neighbor check is here.
The potential solution here is to de-map HA, which would mean that the CA has only 2 neighbors that are core atoms, which would lead to the chirality preserving torsion being added. I tried this, but the CA stereochemistry still looks wrong. Here is the chirality preserving torsion that is being added:
# p1, p2, p3, p4, [periodicity, adjusted_phase, k, p4_target_growth_index]
[2611, 2607, 2609, 2608, (1.0, 1.064985300912619, 5020.8, 10.0)]
where [2611, 2607, 2609, 2608] corresponds to [CB, CA, C, HA].
@dominicrufa and I are not sure what's wrong with the above improper torsion. I originally thought that the problem was that we should be using N instead of HA to form the torsion, since N is a core atom and HA is no longer a core atom in this case, but I manually set the torsion to use particles [CB, CA, C, N] and this still results in CB being placed with the wrong stereochemistry at CA.
It seems like this problem is very specific to de-mapping CB in protein mutations. If we can't figure out an easy solution for this, maybe we should consider always mapping CB atoms? Especially since I haven't rigorously demonstrated that de-mapping CB is a helpful for reducing sampling challenges.
@jchodera @dominicrufa @ijpulidos : I am not able to reproduce the stereochemistry issue when I run the geometry engine on a capped N501 peptide (N501 + 3 residues before and after it, capped). It seems that the stereochemistry issue is occurring very specifically within the context of apo RBD and complex RBD:ACE2 systems... which is really puzzling since the geometry engine should only be using N501 valence terms in the proposal.
I computed torsions for the different systems that are output of the scripts that @zhang-ivy shared (5 samples each):
And I obtained what's summarized in the following figures, which seem to suggest the very first torsion, which corresponds to indices (2611, 2607, 2609, 2610)
is the key difference. Now, why is it calculating negative torsions instead of the expected positive? I don't know but I think this can pin point where the problem is occurring. I hope you can read the figure without too much problem.
@ijpulidos : Great work tracking down the problematic torsion! Can you figure out which atoms those indices correspond to? Knowing the atom names will help us brainstorm where the problem is coming from. If those are indices from the new topology, you should be able to print out the names of the atoms with something like this:
for atom in topology_proposal.new_topology.atoms():
if atom.index in [2611, 2607, 2609, 2610]:
print(atom.name)
The next step is to determine where the bug in the GeometryEngine
is that is causing this torsion to be wrong. If you get stuck with this, let's find a time to chat and brainstorm together!
@ijpulidos : Great work tracking down the problematic torsion! Can you figure out which atoms those indices correspond to?
@zhang-ivy Wouldn't that just be the same indices (plus 1, since it's 0-based in perses) in the output PDB file? So in that case they would be CB, CA, C and O. From the PDB:
...
ATOM 2608 CA TYR 1 501 -33.803 14.943 3.463 1.00 0.00 C
...
ATOM 2610 C TYR 1 501 -32.394 14.409 3.821 1.00 0.00 C
ATOM 2611 O TYR 1 501 -32.263 13.393 4.504 1.00 0.00 O
ATOM 2612 CB TYR 1 501 -33.609 15.857 2.254 1.00 0.00 C
...
Edit: Changed the order in the comment to reflect the same indices order.
Wouldn't that just be the same indices (plus 1, since it's 0-based in perses) in the output PDB file?
Yes its fine to do it this way (manually) too
@ijpulidos : Is this the error you get when you remove the neighbor check?
/scratch/lsftmp/2666993.tmpdir/ipykernel_34760/3935921749.py in _determine_extra_torsions(self, torsion_force, reference_topology, growth_indices)
176
177 #find p4:
--> 178 p4_target_growth_index = min(tup[1] for tup in _nbr_to_growth_index_tuple if tup[1] > p1_target_growth_index)
179 p4 = [q[0] for q in _nbr_to_growth_index_tuple if q[1] == p4_target_growth_index][0] #take the first hit
180 _logger.debug(f"\t\t\t\t\tgrowth index carrying this improper: {p4_target_growth_index}")
ValueError: min() arg is an empty sequence
Note that when I print out _node_growth_index
and _nbr_to_growth_index_tuple
, I get:
_node_growth_index: 0
_nbr_to_growth_index_tuple [(53, 0), (55, 1), (58, 0), (51, 0)]
Therefore, I'm assuming what happens is p1_target_growth_index
is set to 1, and then in line 178, it tries to call min([])
(because there are no more indices with _tup[1]
> 1), which throws the above error.
If so, I think we'll need @dominicrufa 's help with understanding what's going on here. It's difficult for me to understand the rationale here
@ijpulidos and I spent some time debugging this together, and it seems like the reason why this geometry engine issue is happening in the N501 peptide but not in the capped amino acid is because of a difference in the starting geometries: in particular, the CA-N-C-O torsion.
Here I'm showing the capped amino acid (built manually using pymol) in pink vs the N501 peptide (taken directly from the RBD:ACE2 structure) in yellow. The CA-N-C-O angles in the capped amino acid vs the peptide are different:
When I rotate the N-C bond in the peptide to match the torsion angle in the capped amino acid, I no longer see a the stereochemistry issue.
The CA-N-C-O torsion angle must be strained in the peptide, which causes the potential energies here to differ from the potential energies in the capped amino acid (and peptide with the rotated N-C bond). Therefore, the probabilities of the torsion angles will also differ, which is why the stereochemistry arises.
Here's a pymol session with the relevant PDBs loaded: debug_stereochem.pse.zip
Moving forward, I will always map beta carbons in protein mutations, so this isn't blocking any science for me.
Does this suggest we need to add the stereochemistry restraint then in order to correct these defective cases where the energetics of a slightly strained torsion do not lead to the correct stereochemistry?
We might also want to add an explicit stereochemistry check after the geometry build and reject improperly built geometries, or at least raise an Exception.
@jchodera and I found in issue with the geometry proposals in WT RBD:ACE2 N501Y and beta variant RBD:ACE2 Y501N -- the CA chirality is R, not S. This happens in both apo and complex. Note that for these mutations, I do not map CB atoms and beyond.
This does not seem to be a problem in barnase:barstar (I do map CBs here) or capped ASN->TYR (I do not map CBs here).
@dominicrufa mentioned that the stereochemistry code (which starts here) maintains stereochemistry for atoms in unique old and new, but not for core atoms. This is a problem for amino acid mutations when we don't map CBs, because then CA chirality can be proposed with wrong stereochemistry.
@jchodera : If you see a quick fix for this, please let me know! I've attached scripts you can use to generate geometry proposals for capped ASN->TYR as well as RBD:ACE2 N501Y.
debug_geometry_engine_stereo.zip