Degiacomi-Lab / molearn

protein conformational spaces meet machine learning
https://degiacomi.org/software/molearn/
GNU General Public License v3.0
40 stars 11 forks source link

Wrongly determined C terminal residue #4

Closed ryankzhu closed 1 year ago

ryankzhu commented 1 year ago

Hi!

I tried to train OpenMM_Physics_Trainer on TEN residue chignolin trajectories, and got the following error:

Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
device: cuda
Dataset.shape: torch.Size([96254, 3, 49])
mean: 0.23340530256609915, std: 3.2374652705055245
<Residue 0 (TYR) of chain 0>, 0, is a being set as a N terminal residue
<Residue 8 (TRP) of chain 0> is a being set as a C terminal residue
Traceback (most recent call last):
  File "train.py", line 22, in <module>
    trainer.prepare_physics(remove_NB = True)
  File "/home/rzhu/Desktop/projects/molearn/src/molearn/trainers/openmm_physics_trainer.py", line 17, in prepare_physics
    self.physics_loss = openmm_energy(self.mol, self.std, clamp=clamp_kwargs, platform = 'CUDA' if self.device == torch.device('cuda') else 'Reference', atoms = self._data.atoms, **kwargs)
  File "/home/rzhu/Desktop/projects/molearn/src/molearn/loss_functions/openmm_thread.py", line 357, in __init__
    self.openmmplugin = OpenmmPluginScore(mol, **kwargs)
  File "/home/rzhu/Desktop/projects/molearn/src/molearn/loss_functions/openmm_thread.py", line 142, in __init__
    self.system = self.forcefield.createSystem(self.pdb.topology)
  File "/home/rzhu/Loc/miniconda3/envs/ops/lib/python3.8/site-packages/openmm/app/forcefield.py", line 1212, in createSystem
    templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
  File "/home/rzhu/Loc/miniconda3/envs/ops/lib/python3.8/site-packages/openmm/app/forcefield.py", line 1427, in _matchAllResiduesToTemplates
    raise ValueError('No template found for residue %d (%s).  %s' % (res.index+1, res.name, _findMatchErrors(self, res)))
ValueError: No template found for residue 10 (TYR).  The set of atoms is similar to ALA, but it is missing 5 hydrogen atoms.

It seems that the C terminal residue is wrongly determined (should be 10 TYR), which then leads to the template not found error in openmm. But I don't know where and how the terminal residues are set. I tried to create a system in openmm with the same pdb, and it worked.

I tried pdbs with or without hydrogen atoms. I also tried to fix the atom name using PDBFixer. The error still exists. This is the current pdb file:

MODEL        0
ATOM      1  N   TYR A   1      -5.245  -0.598   1.748  1.00  0.00           N  
ATOM      2  CA  TYR A   1      -5.281  -1.312   0.428  1.00  0.00           C  
ATOM      3  CB  TYR A   1      -6.435  -0.758  -0.436  1.00  0.00           C  
ATOM      4  CG  TYR A   1      -6.891  -1.800  -1.472  1.00  0.00           C  
ATOM      5  CD1 TYR A   1      -7.798  -2.820  -1.100  1.00  0.00           C  
ATOM      6  CE1 TYR A   1      -8.289  -3.713  -2.138  1.00  0.00           C  
ATOM      7  CZ  TYR A   1      -7.660  -3.688  -3.358  1.00  0.00           C  
ATOM      8  OH  TYR A   1      -7.954  -4.701  -4.324  1.00  0.00           O  
ATOM      9  CE2 TYR A   1      -6.372  -1.806  -2.753  1.00  0.00           C  
ATOM     10  CD2 TYR A   1      -6.781  -2.745  -3.721  1.00  0.00           C  
ATOM     11  C   TYR A   1      -3.884  -1.357  -0.085  1.00  0.00           C  
ATOM     12  O   TYR A   1      -3.053  -2.201   0.360  1.00  0.00           O  
ATOM     13  N   TYR A   2      -3.570  -0.461  -1.038  1.00  0.00           N  
ATOM     14  CA  TYR A   2      -2.139  -0.167  -1.369  1.00  0.00           C  
ATOM     15  CB  TYR A   2      -1.928  -0.154  -2.877  1.00  0.00           C  
ATOM     16  CG  TYR A   2      -1.942  -1.582  -3.462  1.00  0.00           C  
ATOM     17  CD1 TYR A   2      -2.868  -1.830  -4.466  1.00  0.00           C  
ATOM     18  CE1 TYR A   2      -3.002  -3.039  -5.056  1.00  0.00           C  
ATOM     19  CZ  TYR A   2      -2.080  -4.051  -4.715  1.00  0.00           C  
ATOM     20  OH  TYR A   2      -2.216  -5.276  -5.345  1.00  0.00           O  
ATOM     21  CE2 TYR A   2      -1.057  -2.618  -3.112  1.00  0.00           C  
ATOM     22  CD2 TYR A   2      -1.141  -3.853  -3.744  1.00  0.00           C  
ATOM     23  C   TYR A   2      -1.813   1.266  -0.862  1.00  0.00           C  
ATOM     24  O   TYR A   2      -2.510   2.200  -1.166  1.00  0.00           O  
ATOM     25  N   ASP A   3      -0.691   1.428  -0.023  1.00  0.00           N  
ATOM     26  CA  ASP A   3      -0.047   2.702   0.345  1.00  0.00           C  
ATOM     27  CB  ASP A   3       1.176   2.358   1.171  1.00  0.00           C  
ATOM     28  CG  ASP A   3       1.856   3.510   1.813  1.00  0.00           C  
ATOM     29  OD1 ASP A   3       1.203   4.366   2.427  1.00  0.00           O  
ATOM     30  OD2 ASP A   3       3.065   3.702   1.630  1.00  0.00           O  
ATOM     31  C   ASP A   3       0.315   3.523  -0.947  1.00  0.00           C  
ATOM     32  O   ASP A   3       0.763   2.940  -1.937  1.00  0.00           O  
ATOM     33  N   PRO A   4      -0.019   4.835  -1.066  1.00  0.00           N  
ATOM     34  CD  PRO A   4      -1.007   5.422  -0.207  1.00  0.00           C  
ATOM     35  CG  PRO A   4       0.431   5.680  -2.061  1.00  0.00           C  
ATOM     36  CB  PRO A   4      -0.962   7.030  -2.984  1.00  0.00           C  
ATOM     37  CA  PRO A   4      -2.496   6.738  -1.126  1.00  0.00           C  
ATOM     38  C   PRO A   4       1.893   6.047  -2.031  1.00  0.00           C  
ATOM     39  O   PRO A   4       2.515   6.178  -3.063  1.00  0.00           O  
ATOM     40  N   GLU A   5       2.496   6.140  -0.785  1.00  0.00           N  
ATOM     41  CA  GLU A   5       3.903   6.452  -0.606  1.00  0.00           C  
ATOM     42  CB  GLU A   5       4.231   6.996   0.843  1.00  0.00           C  
ATOM     43  CG  GLU A   5       5.741   7.288   1.148  1.00  0.00           C  
ATOM     44  CD  GLU A   5       6.355   8.467   0.260  1.00  0.00           C  
ATOM     45  OE1 GLU A   5       7.207   8.197  -0.627  1.00  0.00           O  
ATOM     46  OE2 GLU A   5       6.033   9.685   0.493  1.00  0.00           O  
ATOM     47  C   GLU A   5       4.928   5.466  -1.038  1.00  0.00           C  
ATOM     48  O   GLU A   5       5.834   5.650  -1.844  1.00  0.00           O  
ATOM     49  N   THR A   6       4.724   4.211  -0.535  1.00  0.00           N  
ATOM     50  CA  THR A   6       5.568   3.050  -0.787  1.00  0.00           C  
ATOM     51  CB  THR A   6       5.920   2.318   0.462  1.00  0.00           C  
ATOM     52  CG2 THR A   6       4.812   1.763   1.198  1.00  0.00           C  
ATOM     53  OG1 THR A   6       7.675   3.667   1.018  1.00  0.00           O  
ATOM     54  C   THR A   6       5.019   2.029  -1.740  1.00  0.00           C  
ATOM     55  O   THR A   6       5.819   1.276  -2.261  1.00  0.00           O  
ATOM     56  N   GLY A   7       3.656   2.050  -2.023  1.00  0.00           N  
ATOM     57  CA  GLY A   7       3.034   1.125  -2.925  1.00  0.00           C  
ATOM     58  C   GLY A   7       2.737  -0.232  -2.337  1.00  0.00           C  
ATOM     59  O   GLY A   7       2.271  -1.176  -2.938  1.00  0.00           O  
ATOM     60  N   THR A   8       3.015  -0.426  -1.007  1.00  0.00           N  
ATOM     61  CA  THR A   8       3.002  -1.582  -0.248  1.00  0.00           C  
ATOM     62  CB  THR A   8       3.848  -1.478   1.095  1.00  0.00           C  
ATOM     63  CG2 THR A   8       5.169  -1.289   0.618  1.00  0.00           C  
ATOM     64  OG1 THR A   8       3.944  -3.693   1.324  1.00  0.00           O  
ATOM     65  C   THR A   8       1.592  -1.898   0.116  1.00  0.00           C  
ATOM     66  O   THR A   8       0.797  -1.006   0.448  1.00  0.00           O  
ATOM     67  N   TRP A   9       1.253  -3.188   0.197  1.00  0.00           N  
ATOM     68  CA  TRP A   9      -0.007  -3.607   0.772  1.00  0.00           C  
ATOM     69  CB  TRP A   9      -0.314  -5.030   0.357  1.00  0.00           C  
ATOM     70  CG  TRP A   9      -1.525  -5.708   1.015  1.00  0.00           C  
ATOM     71  CD1 TRP A   9      -1.542  -6.747   1.912  1.00  0.00           C  
ATOM     72  NE1 TRP A   9      -2.760  -7.327   1.958  1.00  0.00           N  
ATOM     73  CE2 TRP A   9      -3.620  -6.649   1.188  1.00  0.00           C  
ATOM     74  CZ2 TRP A   9      -2.917  -5.640   0.533  1.00  0.00           C  
ATOM     75  CH2 TRP A   9      -2.871  -4.222  -1.016  1.00  0.00           C  
ATOM     76  CZ3 TRP A   9      -5.349  -4.796  -1.652  1.00  0.00           C  
ATOM     77  CE3 TRP A   9      -5.447  -7.655   1.546  1.00  0.00           C  
ATOM     78  CD2 TRP A   9      -6.673  -6.330  -0.182  1.00  0.00           C  
ATOM     79  C   TRP A   9      -0.215  -3.410   2.285  1.00  0.00           C  
ATOM     80  O   TRP A   9       0.379  -4.019   3.145  1.00  0.00           O  
ATOM     81  N   TYR A  10      -3.007  -2.361   4.334  1.00  0.00           N  
ATOM     82  CA  TYR A  10      -3.551  -3.151   5.155  1.00  0.00           C  
ATOM     83  CB  TYR A  10      -1.888  -2.428   1.860  1.00  0.00           C  
ATOM     84  CG  TYR A  10      -1.030  -0.903   4.320  1.00  0.00           C  
ATOM     85  CD1 TYR A  10      -1.137  -0.338   3.375  1.00  0.00           C  
ATOM     86  CE1 TYR A  10       0.424  -0.852   4.790  1.00  0.00           C  
ATOM     87  CZ  TYR A  10       0.666   1.096   3.915  1.00  0.00           C  
ATOM     88  OH  TYR A  10       2.501   0.366   4.816  1.00  0.00           O  
ATOM     89  CE2 TYR A  10       3.054  -0.698   5.573  1.00  0.00           C  
ATOM     90  CD2 TYR A  10       4.625  -1.632   6.322  1.00  0.00           C  
ATOM     91  C   TYR A  10       0.474  -2.813   5.743  1.00  0.00           C  
ATOM     92  O   TYR A  10       2.314  -1.810   5.963  1.00  0.00           O  
ATOM     93  OXT TYR A  10       2.749  -2.594   6.578  1.00  0.00           O  
TER      94      TYR A  10
ENDMDL
SCMusson commented 1 year ago

I can't seem to reproduce that bug. I ran the following with the above pdb:

#%%
import sys, os
sys.path.insert(0, os.path.join(os.path.abspath(os.pardir),'src'))
from molearn.data import PDBData
from molearn.trainers import OpenMM_Physics_Trainer
from molearn.models.foldingnet import AutoEncoder
import torch

#%%

if __name__ == '__main__':

    ##### Load Data #####
    data = PDBData()
    #data.import_pdb('data/MurD_closed_selection.pdb')
    # I saved the example you gave above in a file called test.pdb, Seeing as I only have the one frame I'll load it 10 times to give me enough frames for a validation/train split to work (minimum 10 examples) 
    for i in range(10):  
        data.import_pdb('test.pdb') 
    data.fix_terminal()
    data.atomselect(atoms = ['CA', 'C', 'N', 'CB', 'O'])

    ##### Prepare Trainer #####
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    trainer = OpenMM_Physics_Trainer(device=device)

    trainer.set_data(data, batch_size=8, validation_split=0.1, manual_seed = 25)
    trainer.prepare_physics(remove_NB = True)

    trainer.set_autoencoder(AutoEncoder, out_points = data.dataset.shape[-1])
    trainer.prepare_optimiser()

And got the following output:

device: cuda Dataset.shape: torch.Size([10, 3, 49]) mean: 0.5207414965986394, std: 2.9792635808528924 <Residue 0 (TYR) of chain 0>, 0, is a being set as a N terminal residue <Residue 9 (TYR) of chain 0> is a being set as a C terminal residue nothing else 47781.21484375

This looks okay to me. Could you send me an example that will reproduce your issue?

ryankzhu commented 1 year ago

This is very weird to me. I did exactly the same thing: I copied your test code (just changed the path to molearn), and created the test.pdb using the pdb I posted above:

import sys, os
sys.path.insert(0, '/home/rzhu/Desktop/projects/molearn/src')
from molearn.data import PDBData
from molearn.trainers import OpenMM_Physics_Trainer
from molearn.models.foldingnet import AutoEncoder
import torch

if __name__ == '__main__':

    ##### Load Data #####
    data = PDBData()
    for i in range(10):  
        data.import_pdb('test.pdb') 
    data.fix_terminal()
    data.atomselect(atoms = ['CA', 'C', 'N', 'CB', 'O'])

    ##### Prepare Trainer #####
    #device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    device = torch.device('cpu')
    trainer = OpenMM_Physics_Trainer(device=device)

    trainer.set_data(data, batch_size=8, validation_split=0.1, manual_seed = 25)
    trainer.prepare_physics(remove_NB = True)

    trainer.set_autoencoder(AutoEncoder, out_points = data.dataset.shape[-1])
    trainer.prepare_optimiser()

But the same error still exists in my case. I notice when I execute the code, a pdb file is created under the working directory named 'tmp460626836.pdb'. It contains only selected backbone atoms and puts a TER flag after 9TRP:

MODEL        1
ATOM      1  N   TYR A   1      -5.245  -0.598   1.748  0.00  1.00           N
ATOM      2  CA  TYR A   1      -5.281  -1.312   0.428  0.00  1.00           C
ATOM      3  CB  TYR A   1      -6.435  -0.758  -0.436  0.00  1.00           C
ATOM      4  C   TYR A   1      -3.884  -1.357  -0.085  0.00  1.00           C
ATOM      5  O   TYR A   1      -3.053  -2.201   0.360  0.00  1.00           O
ATOM      6  N   TYR A   2      -3.570  -0.461  -1.038  0.00  1.00           N
ATOM      7  CA  TYR A   2      -2.139  -0.167  -1.369  0.00  1.00           C
ATOM      8  CB  TYR A   2      -1.928  -0.154  -2.877  0.00  1.00           C
ATOM      9  C   TYR A   2      -1.813   1.266  -0.862  0.00  1.00           C
ATOM     10  O   TYR A   2      -2.510   2.200  -1.166  0.00  1.00           O
ATOM     11  N   ASP A   3      -0.691   1.428  -0.023  0.00  1.00           N
ATOM     12  CA  ASP A   3      -0.047   2.702   0.345  0.00  1.00           C
ATOM     13  CB  ASP A   3       1.176   2.358   1.171  0.00  1.00           C
ATOM     14  C   ASP A   3       0.315   3.523  -0.947  0.00  1.00           C
ATOM     15  O   ASP A   3       0.763   2.940  -1.937  0.00  1.00           O
ATOM     16  N   PRO A   4      -0.019   4.835  -1.066  0.00  1.00           N
ATOM     17  CB  PRO A   4      -0.962   7.030  -2.984  0.00  1.00           C
ATOM     18  CA  PRO A   4      -2.496   6.738  -1.126  0.00  1.00           C
ATOM     19  C   PRO A   4       1.893   6.047  -2.031  0.00  1.00           C
ATOM     20  O   PRO A   4       2.515   6.178  -3.063  0.00  1.00           O
ATOM     21  N   GLU A   5       2.496   6.140  -0.785  0.00  1.00           N
ATOM     22  CA  GLU A   5       3.903   6.452  -0.606  0.00  1.00           C
ATOM     23  CB  GLU A   5       4.231   6.996   0.843  0.00  1.00           C
ATOM     24  C   GLU A   5       4.928   5.466  -1.038  0.00  1.00           C
ATOM     25  O   GLU A   5       5.834   5.650  -1.844  0.00  1.00           O
ATOM     26  N   THR A   6       4.724   4.211  -0.535  0.00  1.00           N
ATOM     27  CA  THR A   6       5.568   3.050  -0.787  0.00  1.00           C
ATOM     28  CB  THR A   6       5.920   2.318   0.462  0.00  1.00           C
ATOM     29  C   THR A   6       5.019   2.029  -1.740  0.00  1.00           C
ATOM     30  O   THR A   6       5.819   1.276  -2.261  0.00  1.00           O
ATOM     31  N   GLY A   7       3.656   2.050  -2.023  0.00  1.00           N
ATOM     32  CA  GLY A   7       3.034   1.125  -2.925  0.00  1.00           C
ATOM     33  C   GLY A   7       2.737  -0.232  -2.337  0.00  1.00           C
ATOM     34  O   GLY A   7       2.271  -1.176  -2.938  0.00  1.00           O
ATOM     35  N   THR A   8       3.015  -0.426  -1.007  0.00  1.00           N
ATOM     36  CA  THR A   8       3.002  -1.582  -0.248  0.00  1.00           C
ATOM     37  CB  THR A   8       3.848  -1.478   1.095  0.00  1.00           C
ATOM     38  C   THR A   8       1.592  -1.898   0.116  0.00  1.00           C
ATOM     39  O   THR A   8       0.797  -1.006   0.448  0.00  1.00           O
ATOM     40  N   TRP A   9       1.253  -3.188   0.197  0.00  1.00           N
ATOM     41  CA  TRP A   9      -0.007  -3.607   0.772  0.00  1.00           C
ATOM     42  CB  TRP A   9      -0.314  -5.030   0.357  0.00  1.00           C
ATOM     43  C   TRP A   9      -0.215  -3.410   2.285  0.00  1.00           C
ATOM     44  O   TRP A   9       0.379  -4.019   3.145  0.00  1.00           O
TER      44      TRP A   9
ATOM     45  N   TYR A  10      -3.007  -2.361   4.334  0.00  1.00           N
ATOM     46  CA  TYR A  10      -3.551  -3.151   5.155  0.00  1.00           C
ATOM     47  CB  TYR A  10      -1.888  -2.428   1.860  0.00  1.00           C
ATOM     48  C   TYR A  10       0.474  -2.813   5.743  0.00  1.00           C
ATOM     49  O   TYR A  10       2.314  -1.810   5.963  0.00  1.00           O
TER      49      TYR A  10
ENDMDL
SCMusson commented 1 year ago

It seems this is a biobox issue. For some reason it doesn't affect my version. I'll push a patch to molearn and eventually I'll fix it in biobox too.

SCMusson commented 1 year ago

Should be fixed in c099a6f3175f6f562c60ad23379b373ced115d78 , please can you pull and see if this fixes the issue.

ryankzhu commented 1 year ago

Should be fixed in c099a6f , please can you pull and see if this fixes the issue.

The issue is now resolved. Thank you very much!