Closed ryankzhu closed 1 year ago
I can't seem to reproduce that bug. I ran the following with the above pdb:
#%%
import sys, os
sys.path.insert(0, os.path.join(os.path.abspath(os.pardir),'src'))
from molearn.data import PDBData
from molearn.trainers import OpenMM_Physics_Trainer
from molearn.models.foldingnet import AutoEncoder
import torch
#%%
if __name__ == '__main__':
##### Load Data #####
data = PDBData()
#data.import_pdb('data/MurD_closed_selection.pdb')
# I saved the example you gave above in a file called test.pdb, Seeing as I only have the one frame I'll load it 10 times to give me enough frames for a validation/train split to work (minimum 10 examples)
for i in range(10):
data.import_pdb('test.pdb')
data.fix_terminal()
data.atomselect(atoms = ['CA', 'C', 'N', 'CB', 'O'])
##### Prepare Trainer #####
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trainer = OpenMM_Physics_Trainer(device=device)
trainer.set_data(data, batch_size=8, validation_split=0.1, manual_seed = 25)
trainer.prepare_physics(remove_NB = True)
trainer.set_autoencoder(AutoEncoder, out_points = data.dataset.shape[-1])
trainer.prepare_optimiser()
And got the following output:
device: cuda Dataset.shape: torch.Size([10, 3, 49]) mean: 0.5207414965986394, std: 2.9792635808528924 <Residue 0 (TYR) of chain 0>, 0, is a being set as a N terminal residue <Residue 9 (TYR) of chain 0> is a being set as a C terminal residue nothing else 47781.21484375
This looks okay to me. Could you send me an example that will reproduce your issue?
This is very weird to me. I did exactly the same thing: I copied your test code (just changed the path to molearn), and created the test.pdb using the pdb I posted above:
import sys, os
sys.path.insert(0, '/home/rzhu/Desktop/projects/molearn/src')
from molearn.data import PDBData
from molearn.trainers import OpenMM_Physics_Trainer
from molearn.models.foldingnet import AutoEncoder
import torch
if __name__ == '__main__':
##### Load Data #####
data = PDBData()
for i in range(10):
data.import_pdb('test.pdb')
data.fix_terminal()
data.atomselect(atoms = ['CA', 'C', 'N', 'CB', 'O'])
##### Prepare Trainer #####
#device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cpu')
trainer = OpenMM_Physics_Trainer(device=device)
trainer.set_data(data, batch_size=8, validation_split=0.1, manual_seed = 25)
trainer.prepare_physics(remove_NB = True)
trainer.set_autoencoder(AutoEncoder, out_points = data.dataset.shape[-1])
trainer.prepare_optimiser()
But the same error still exists in my case. I notice when I execute the code, a pdb file is created under the working directory named 'tmp460626836.pdb'. It contains only selected backbone atoms and puts a TER flag after 9TRP:
MODEL 1
ATOM 1 N TYR A 1 -5.245 -0.598 1.748 0.00 1.00 N
ATOM 2 CA TYR A 1 -5.281 -1.312 0.428 0.00 1.00 C
ATOM 3 CB TYR A 1 -6.435 -0.758 -0.436 0.00 1.00 C
ATOM 4 C TYR A 1 -3.884 -1.357 -0.085 0.00 1.00 C
ATOM 5 O TYR A 1 -3.053 -2.201 0.360 0.00 1.00 O
ATOM 6 N TYR A 2 -3.570 -0.461 -1.038 0.00 1.00 N
ATOM 7 CA TYR A 2 -2.139 -0.167 -1.369 0.00 1.00 C
ATOM 8 CB TYR A 2 -1.928 -0.154 -2.877 0.00 1.00 C
ATOM 9 C TYR A 2 -1.813 1.266 -0.862 0.00 1.00 C
ATOM 10 O TYR A 2 -2.510 2.200 -1.166 0.00 1.00 O
ATOM 11 N ASP A 3 -0.691 1.428 -0.023 0.00 1.00 N
ATOM 12 CA ASP A 3 -0.047 2.702 0.345 0.00 1.00 C
ATOM 13 CB ASP A 3 1.176 2.358 1.171 0.00 1.00 C
ATOM 14 C ASP A 3 0.315 3.523 -0.947 0.00 1.00 C
ATOM 15 O ASP A 3 0.763 2.940 -1.937 0.00 1.00 O
ATOM 16 N PRO A 4 -0.019 4.835 -1.066 0.00 1.00 N
ATOM 17 CB PRO A 4 -0.962 7.030 -2.984 0.00 1.00 C
ATOM 18 CA PRO A 4 -2.496 6.738 -1.126 0.00 1.00 C
ATOM 19 C PRO A 4 1.893 6.047 -2.031 0.00 1.00 C
ATOM 20 O PRO A 4 2.515 6.178 -3.063 0.00 1.00 O
ATOM 21 N GLU A 5 2.496 6.140 -0.785 0.00 1.00 N
ATOM 22 CA GLU A 5 3.903 6.452 -0.606 0.00 1.00 C
ATOM 23 CB GLU A 5 4.231 6.996 0.843 0.00 1.00 C
ATOM 24 C GLU A 5 4.928 5.466 -1.038 0.00 1.00 C
ATOM 25 O GLU A 5 5.834 5.650 -1.844 0.00 1.00 O
ATOM 26 N THR A 6 4.724 4.211 -0.535 0.00 1.00 N
ATOM 27 CA THR A 6 5.568 3.050 -0.787 0.00 1.00 C
ATOM 28 CB THR A 6 5.920 2.318 0.462 0.00 1.00 C
ATOM 29 C THR A 6 5.019 2.029 -1.740 0.00 1.00 C
ATOM 30 O THR A 6 5.819 1.276 -2.261 0.00 1.00 O
ATOM 31 N GLY A 7 3.656 2.050 -2.023 0.00 1.00 N
ATOM 32 CA GLY A 7 3.034 1.125 -2.925 0.00 1.00 C
ATOM 33 C GLY A 7 2.737 -0.232 -2.337 0.00 1.00 C
ATOM 34 O GLY A 7 2.271 -1.176 -2.938 0.00 1.00 O
ATOM 35 N THR A 8 3.015 -0.426 -1.007 0.00 1.00 N
ATOM 36 CA THR A 8 3.002 -1.582 -0.248 0.00 1.00 C
ATOM 37 CB THR A 8 3.848 -1.478 1.095 0.00 1.00 C
ATOM 38 C THR A 8 1.592 -1.898 0.116 0.00 1.00 C
ATOM 39 O THR A 8 0.797 -1.006 0.448 0.00 1.00 O
ATOM 40 N TRP A 9 1.253 -3.188 0.197 0.00 1.00 N
ATOM 41 CA TRP A 9 -0.007 -3.607 0.772 0.00 1.00 C
ATOM 42 CB TRP A 9 -0.314 -5.030 0.357 0.00 1.00 C
ATOM 43 C TRP A 9 -0.215 -3.410 2.285 0.00 1.00 C
ATOM 44 O TRP A 9 0.379 -4.019 3.145 0.00 1.00 O
TER 44 TRP A 9
ATOM 45 N TYR A 10 -3.007 -2.361 4.334 0.00 1.00 N
ATOM 46 CA TYR A 10 -3.551 -3.151 5.155 0.00 1.00 C
ATOM 47 CB TYR A 10 -1.888 -2.428 1.860 0.00 1.00 C
ATOM 48 C TYR A 10 0.474 -2.813 5.743 0.00 1.00 C
ATOM 49 O TYR A 10 2.314 -1.810 5.963 0.00 1.00 O
TER 49 TYR A 10
ENDMDL
It seems this is a biobox issue. For some reason it doesn't affect my version. I'll push a patch to molearn and eventually I'll fix it in biobox too.
Should be fixed in c099a6f3175f6f562c60ad23379b373ced115d78 , please can you pull and see if this fixes the issue.
Hi!
I tried to train OpenMM_Physics_Trainer on TEN residue chignolin trajectories, and got the following error:
It seems that the C terminal residue is wrongly determined (should be 10 TYR), which then leads to the template not found error in openmm. But I don't know where and how the terminal residues are set. I tried to create a system in openmm with the same pdb, and it worked.
I tried pdbs with or without hydrogen atoms. I also tried to fix the atom name using PDBFixer. The error still exists. This is the current pdb file: