choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
181 stars 51 forks source link

Dealing with protein caps from Maestro #913

Open ijpulidos opened 2 years ago

ijpulidos commented 2 years ago

We want to be able to deal with protein caps for proteins prepared with the Maestro software or the suite's protein preparation wizard. This will require to check what are the possible caps these use and make sure we are able to deal with them. (ex. NMA instead of NME).

ijpulidos commented 2 years ago

Examples of proteins with terminal caps prepared with Maestro should be really helpful for this purpose.

jchodera commented 2 years ago

I believe @dominicrufa has experience with this and can let us know if this was an issue, or how he dealt with it.

jchodera commented 2 years ago

The thrombin protein-ligand benchmark system appears to fail with this error:

2022-01-30 18:33:06,738:(0.52s):openmmforcefields.generators.template_generators:Requested to generate parameters for residue <Residue 283 (NME) of chain 0>
2022-01-30 18:33:06,742:(0.00s):openmmforcefields.generators.template_generators:Did not recognize residue NME; did you forget to call .add_molecules() to add it?
Traceback (most recent call last):
  File "/lila/data/chodera/chodera/perses/perses/benchmarks/thrombin-1-ns-12-states/run_benchmarks.py", line 176, in <module>
    run_relative_perturbation(lig_a_index, lig_b_index, reverse=is_reversed)
  File "/lila/data/chodera/chodera/perses/perses/benchmarks/thrombin-1-ns-12-states/run_benchmarks.py", line 92, in run_relative_perturbation
    run(new_yaml)
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/setup_relative_calculation.py", line 764, in run
    setup_dict = run_setup(setup_options)
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/setup_relative_calculation.py", line 459, in run_setup
    fe_setup = RelativeFEPSetup(ligand_file, old_ligand_index, new_ligand_index, forcefield_files,phases=phases,
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/relative_setup.py", line 376, in __init__
    self._complex_topology_old_solvated, self._complex_positions_old_solvated, self._complex_system_old_solvated = self._solvate_system(
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/relative_setup.py", line 827, in _solvate_system
    modeller.addSolvent(self._system_generator.forcefield, model=model, padding=self._padding, ionicStrength=ionic_strength)
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/openmm/app/modeller.py", line 483, in addSolvent
    system = forcefield.createSystem(self.topology)
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/openmm/app/forcefield.py", line 1212, in createSystem
    templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
  File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/openmm/app/forcefield.py", line 1427, in _matchAllResiduesToTemplates
    raise ValueError('No template found for residue %d (%s).  %s' % (res.index+1, res.name, _findMatchErrors(self, res)))
ValueError: No template found for residue 284 (NME).  The set of atoms matches NME, but the bonds are different.

Here's the context from protein.pdb for this system:

ATOM   2392  N   THR   147       7.070 -18.840  17.810  1.00  0.00           N
ATOM   2393  H   THR   147       7.860 -18.220  17.720  1.00  0.00           H
ATOM   2394  CA  THR   147       7.020 -20.070  16.990  1.00  0.00           C
ATOM   2395  HA  THR   147       6.340 -20.790  17.450  1.00  0.00           H
ATOM   2396  CB  THR   147       6.350 -19.880  15.620  1.00  0.00           C
ATOM   2397  HB  THR   147       6.930 -19.180  15.020  1.00  0.00           H
ATOM   2398  CG2 THR   147       6.110 -21.220  14.950  1.00  0.00           C
ATOM   2399 1HG2 THR   147       5.630 -21.070  13.980  1.00  0.00           H
ATOM   2400 2HG2 THR   147       7.060 -21.740  14.810  1.00  0.00           H
ATOM   2401 3HG2 THR   147       5.460 -21.830  15.580  1.00  0.00           H
ATOM   2402  OG1 THR   147       5.040 -19.280  15.730  1.00  0.00           O
ATOM   2403  HG1 THR   147       5.110 -18.420  16.150  1.00  0.00           H
ATOM   2404  C   THR   147       8.440 -20.600  16.830  1.00  0.00           C
ATOM   2405  O   THR   147       9.400 -19.870  16.560  1.00  0.00           O
ATOM   2406  N   NME   148       8.680 -22.040  17.010  1.00  0.00           N
ATOM   2407  H   NME   148       7.980 -22.760  17.240  1.00  0.00           H
ATOM   2408  CH3 NME   148      10.100 -22.260  16.800  1.00  0.00           C
ATOM   2409 1HH3 NME   148      10.510 -21.480  16.180  1.00  0.00           H
ATOM   2410 2HH3 NME   148      10.620 -22.270  17.750  1.00  0.00           H
ATOM   2411 3HH3 NME   148      10.260 -23.210  16.310  1.00  0.00           H
ATOM   2412 1HH3 ACE   149       0.970 -17.140  16.260  1.00  0.00           H
ATOM   2413  CH3 ACE   149       0.250 -16.810  15.530  1.00  0.00           C
ATOM   2414 2HH3 ACE   149      -0.240 -15.920  15.890  1.00  0.00           H
ATOM   2415 3HH3 ACE   149      -0.490 -17.580  15.390  1.00  0.00           H
ATOM   2416  C   ACE   149       0.940 -16.520  14.210  1.00  0.00           C
ATOM   2417  O   ACE   149       1.570 -17.410  13.620  1.00  0.00           O
ATOM   2418  N   GLY   150       0.880 -15.170  13.640  1.00  0.00           N
ATOM   2419  H   GLY   150       1.280 -14.510  14.290  1.00  0.00           H
ATOM   2420  CA  GLY   150       1.620 -15.140  12.380  1.00  0.00           C
ATOM   2421  HA1 GLY   150       2.300 -15.990  12.340  1.00  0.00           H
ATOM   2422  HA2 GLY   150       0.920 -15.190  11.550  1.00  0.00           H
ATOM   2423  C   GLY   150       2.440 -13.870  12.210  1.00  0.00           C
ATOM   2424  O   GLY   150       3.130 -13.410  13.130  1.00  0.00           O

and another break in the same system

ATOM   4043  N   GLY   253      28.610  13.970  14.010  1.00  0.00           N
ATOM   4044  H   GLY   253      28.870  14.050  14.980  1.00  0.00           H
ATOM   4045  CA  GLY   253      27.530  14.810  13.470  1.00  0.00           C
ATOM   4046  HA1 GLY   253      27.260  15.570  14.210  1.00  0.00           H
ATOM   4047  HA2 GLY   253      27.870  15.300  12.560  1.00  0.00           H
ATOM   4048  C   GLY   253      26.290  13.980  13.160  1.00  0.00           C
ATOM   4049  O   GLY   253      25.160  14.530  13.260  1.00  0.00           O
ATOM   4050  N   NME   254      26.410  12.580  12.730  1.00  0.00           N
ATOM   4051  H   NME   254      27.290  12.040  12.620  1.00  0.00           H
ATOM   4052  CH3 NME   254      25.070  12.090  12.510  1.00  0.00           C
ATOM   4053 1HH3 NME   254      24.360  12.670  13.090  1.00  0.00           H
ATOM   4054 2HH3 NME   254      24.810  12.170  11.460  1.00  0.00           H
ATOM   4055 3HH3 NME   254      24.990  11.050  12.810  1.00  0.00           H
ATOM   4056 1HH3 ACE   255      18.050  15.000  19.990  1.00  0.00           H
ATOM   4057  CH3 ACE   255      17.520  15.150  19.060  1.00  0.00           C
ATOM   4058 2HH3 ACE   255      16.640  14.520  19.070  1.00  0.00           H
ATOM   4059 3HH3 ACE   255      18.160  14.840  18.240  1.00  0.00           H
ATOM   4060  C   ACE   255      17.130  16.600  18.900  1.00  0.00           C
ATOM   4061  O   ACE   255      17.990  17.490  18.870  1.00  0.00           O
ATOM   4062  N   CYS   256      10.990  17.400  19.550  1.00  0.00           N
ATOM   4063  H   CYS   256      11.770  17.930  19.900  1.00  0.00           H
ATOM   4064  CA  CYS   256      10.940  15.930  19.630  1.00  0.00           C
ATOM   4065  HA  CYS   256      10.900  15.510  18.630  1.00  0.00           H
ATOM   4066  CB  CYS   256      12.240  15.380  20.240  1.00  0.00           C
ATOM   4067  HB1 CYS   256      13.090  15.710  19.650  1.00  0.00           H
ATOM   4068  HB2 CYS   256      12.200  14.290  20.240  1.00  0.00           H
ATOM   4069  SG  CYS   256      12.600  15.850  21.950  1.00  0.00           S
ATOM   4070  C   CYS   256       9.750  15.460  20.460  1.00  0.00           C
ATOM   4071  O   CYS   256       9.320  16.130  21.410  1.00  0.00           O

and the final terminus:

ATOM   4668  HB2 TYR   294      15.840   2.470  -1.460  1.00  0.00           H
ATOM   4669  CG  TYR   294      14.480   3.790  -2.480  1.00  0.00           C
ATOM   4670  CD1 TYR   294      14.920   5.050  -2.010  1.00  0.00           C
ATOM   4671  HD1 TYR   294      15.940   5.180  -1.690  1.00  0.00           H
ATOM   4672  CE1 TYR   294      14.010   6.130  -1.970  1.00  0.00           C
ATOM   4673  HE1 TYR   294      14.320   7.090  -1.600  1.00  0.00           H
ATOM   4674  CZ  TYR   294      12.680   5.950  -2.430  1.00  0.00           C
ATOM   4675  OH  TYR   294      11.810   7.000  -2.420  1.00  0.00           O
ATOM   4676  HH  TYR   294      11.610   7.260  -3.320  1.00  0.00           H
ATOM   4677  CE2 TYR   294      12.250   4.680  -2.910  1.00  0.00           C
ATOM   4678  HE2 TYR   294      11.240   4.540  -3.270  1.00  0.00           H
ATOM   4679  CD2 TYR   294      13.150   3.600  -2.900  1.00  0.00           C
ATOM   4680  HD2 TYR   294      12.810   2.620  -3.220  1.00  0.00           H
ATOM   4681  C   TYR   294      17.490   1.640  -3.410  1.00  0.00           C
ATOM   4682  O   TYR   294      18.140   1.470  -2.360  1.00  0.00           O
ATOM   4683  N   NME   295      17.640   0.730  -4.550  1.00  0.00           N
ATOM   4684  H   NME   295      17.150   0.790  -5.450  1.00  0.00           H
ATOM   4685  CH3 NME   295      18.610  -0.280  -4.160  1.00  0.00           C
ATOM   4686 1HH3 NME   295      19.260   0.110  -3.390  1.00  0.00           H
ATOM   4687 2HH3 NME   295      18.110  -1.160  -3.780  1.00  0.00           H
ATOM   4688 3HH3 NME   295      19.210  -0.560  -5.010  1.00  0.00           H

There are no CONECT records, which I believe the PDB format requires.

The PDB file should also have TER records if there are chain breaks.

jchodera commented 2 years ago

I notice the PDB file provided by Schrödinger with the [JACS paper SI](https://drive.google.com/folderview?id=0BylmDElgu6QLTnJ2WGMzNXBENkk&usp = JACS-thrombin.zip sharing) is different and simply omits residues (leaves a gap) for chain breaks, e.g.:

ATOM   2460  N   THR H 147       7.068 -18.844  17.812  1.00 57.28           N  
ANISOU 2460  N   THR H 147     9357   4126   8279   4136  -2622  -2172
ATOM   2461  CA  THR H 147       7.021 -20.068  16.993  1.00 65.30           C  
ANISOU 2461  CA  THR H 147    11065   4901   8844   4894  -3828  -2926
ATOM   2462  C   THR H 147       8.441 -20.605  16.832  1.00 67.76           C  
ANISOU 2462  C   THR H 147    11712   4844   9190   5513  -2973  -2653
ATOM   2463  O   THR H 147       8.767 -21.337  15.884  1.00 69.18           O  
ANISOU 2463  O   THR H 147    11766   6235   8284   2678    213  -2106
ATOM   2464  CB  THR H 147       6.350 -19.878  15.620  1.00 71.40           C  
ANISOU 2464  CB  THR H 147    12410   5945   8774   4762  -4052  -2656
ATOM   2465  OG1 THR H 147       5.041 -19.283  15.729  1.00 74.77           O  
ANISOU 2465  OG1 THR H 147    12661   7837   7913   5566  -3652   -221
ATOM   2466  CG2 THR H 147       6.108 -21.225  14.947  1.00 78.51           C  
ANISOU 2466  CG2 THR H 147    12765   7637   9426   2825  -3437  -3931
ATOM   2467  H   THR H 147       7.859 -18.223  17.722  1.00  0.00           H  
ATOM   2468  HA  THR H 147       6.340 -20.786  17.450  1.00  0.00           H  
ATOM   2469  HXT THR H 147       9.172 -20.328  17.577  1.00  0.00           H  
ATOM   2470  HB  THR H 147       6.935 -19.179  15.023  1.00  0.00           H  
ATOM   2471  HG1 THR H 147       5.114 -18.421  16.146  1.00  0.00           H  
ATOM   2472 HG21 THR H 147       5.633 -21.068  13.978  1.00  0.00           H  
ATOM   2473 HG22 THR H 147       7.060 -21.737  14.806  1.00  0.00           H  
ATOM   2474 HG23 THR H 147       5.458 -21.834  15.575  1.00  0.00           H  
ATOM   2475  N   GLY H 150       0.878 -15.170  13.641  1.00 57.65           N  
ANISOU 2475  N   GLY H 150     9021   4965   7919     93  -1279    -89
ATOM   2476  CA  GLY H 150       1.619 -15.138  12.383  1.00 44.28           C  
ANISOU 2476  CA  GLY H 150     7719   2585   6519   -176  -2798  -1772
ATOM   2477  C   GLY H 150       2.444 -13.872  12.211  1.00 39.70           C  
ANISOU 2477  C   GLY H 150     7162   2257   5664    344  -3456   -917
ATOM   2478  O   GLY H 150       3.130 -13.412  13.134  1.00 34.20           O  
ANISOU 2478  O   GLY H 150     4866   3446   4682     70  -1786  -1281
ATOM   2479  H1  GLY H 150       0.351 -16.030  13.700  1.00  0.00           H  
ATOM   2480  H2  GLY H 150       1.525 -15.118  14.415  1.00  0.00           H  
ATOM   2481  HA2 GLY H 150       2.302 -15.986  12.342  1.00  0.00           H  
ATOM   2482  HA3 GLY H 150       0.921 -15.194  11.548  1.00  0.00           H  

Uncapped breaks like this would almost certainly cause a problem for OpenMM.

ijpulidos commented 2 years ago

Use examples from #919