Open ijpulidos opened 2 years ago
Examples of proteins with terminal caps prepared with Maestro should be really helpful for this purpose.
I believe @dominicrufa has experience with this and can let us know if this was an issue, or how he dealt with it.
The thrombin
protein-ligand benchmark system appears to fail with this error:
2022-01-30 18:33:06,738:(0.52s):openmmforcefields.generators.template_generators:Requested to generate parameters for residue <Residue 283 (NME) of chain 0>
2022-01-30 18:33:06,742:(0.00s):openmmforcefields.generators.template_generators:Did not recognize residue NME; did you forget to call .add_molecules() to add it?
Traceback (most recent call last):
File "/lila/data/chodera/chodera/perses/perses/benchmarks/thrombin-1-ns-12-states/run_benchmarks.py", line 176, in <module>
run_relative_perturbation(lig_a_index, lig_b_index, reverse=is_reversed)
File "/lila/data/chodera/chodera/perses/perses/benchmarks/thrombin-1-ns-12-states/run_benchmarks.py", line 92, in run_relative_perturbation
run(new_yaml)
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/setup_relative_calculation.py", line 764, in run
setup_dict = run_setup(setup_options)
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/setup_relative_calculation.py", line 459, in run_setup
fe_setup = RelativeFEPSetup(ligand_file, old_ligand_index, new_ligand_index, forcefield_files,phases=phases,
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/relative_setup.py", line 376, in __init__
self._complex_topology_old_solvated, self._complex_positions_old_solvated, self._complex_system_old_solvated = self._solvate_system(
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/perses/app/relative_setup.py", line 827, in _solvate_system
modeller.addSolvent(self._system_generator.forcefield, model=model, padding=self._padding, ionicStrength=ionic_strength)
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/openmm/app/modeller.py", line 483, in addSolvent
system = forcefield.createSystem(self.topology)
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/openmm/app/forcefield.py", line 1212, in createSystem
templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
File "/lila/home/chodera/miniconda/envs/perses-dev/lib/python3.9/site-packages/openmm/app/forcefield.py", line 1427, in _matchAllResiduesToTemplates
raise ValueError('No template found for residue %d (%s). %s' % (res.index+1, res.name, _findMatchErrors(self, res)))
ValueError: No template found for residue 284 (NME). The set of atoms matches NME, but the bonds are different.
Here's the context from protein.pdb
for this system:
ATOM 2392 N THR 147 7.070 -18.840 17.810 1.00 0.00 N
ATOM 2393 H THR 147 7.860 -18.220 17.720 1.00 0.00 H
ATOM 2394 CA THR 147 7.020 -20.070 16.990 1.00 0.00 C
ATOM 2395 HA THR 147 6.340 -20.790 17.450 1.00 0.00 H
ATOM 2396 CB THR 147 6.350 -19.880 15.620 1.00 0.00 C
ATOM 2397 HB THR 147 6.930 -19.180 15.020 1.00 0.00 H
ATOM 2398 CG2 THR 147 6.110 -21.220 14.950 1.00 0.00 C
ATOM 2399 1HG2 THR 147 5.630 -21.070 13.980 1.00 0.00 H
ATOM 2400 2HG2 THR 147 7.060 -21.740 14.810 1.00 0.00 H
ATOM 2401 3HG2 THR 147 5.460 -21.830 15.580 1.00 0.00 H
ATOM 2402 OG1 THR 147 5.040 -19.280 15.730 1.00 0.00 O
ATOM 2403 HG1 THR 147 5.110 -18.420 16.150 1.00 0.00 H
ATOM 2404 C THR 147 8.440 -20.600 16.830 1.00 0.00 C
ATOM 2405 O THR 147 9.400 -19.870 16.560 1.00 0.00 O
ATOM 2406 N NME 148 8.680 -22.040 17.010 1.00 0.00 N
ATOM 2407 H NME 148 7.980 -22.760 17.240 1.00 0.00 H
ATOM 2408 CH3 NME 148 10.100 -22.260 16.800 1.00 0.00 C
ATOM 2409 1HH3 NME 148 10.510 -21.480 16.180 1.00 0.00 H
ATOM 2410 2HH3 NME 148 10.620 -22.270 17.750 1.00 0.00 H
ATOM 2411 3HH3 NME 148 10.260 -23.210 16.310 1.00 0.00 H
ATOM 2412 1HH3 ACE 149 0.970 -17.140 16.260 1.00 0.00 H
ATOM 2413 CH3 ACE 149 0.250 -16.810 15.530 1.00 0.00 C
ATOM 2414 2HH3 ACE 149 -0.240 -15.920 15.890 1.00 0.00 H
ATOM 2415 3HH3 ACE 149 -0.490 -17.580 15.390 1.00 0.00 H
ATOM 2416 C ACE 149 0.940 -16.520 14.210 1.00 0.00 C
ATOM 2417 O ACE 149 1.570 -17.410 13.620 1.00 0.00 O
ATOM 2418 N GLY 150 0.880 -15.170 13.640 1.00 0.00 N
ATOM 2419 H GLY 150 1.280 -14.510 14.290 1.00 0.00 H
ATOM 2420 CA GLY 150 1.620 -15.140 12.380 1.00 0.00 C
ATOM 2421 HA1 GLY 150 2.300 -15.990 12.340 1.00 0.00 H
ATOM 2422 HA2 GLY 150 0.920 -15.190 11.550 1.00 0.00 H
ATOM 2423 C GLY 150 2.440 -13.870 12.210 1.00 0.00 C
ATOM 2424 O GLY 150 3.130 -13.410 13.130 1.00 0.00 O
and another break in the same system
ATOM 4043 N GLY 253 28.610 13.970 14.010 1.00 0.00 N
ATOM 4044 H GLY 253 28.870 14.050 14.980 1.00 0.00 H
ATOM 4045 CA GLY 253 27.530 14.810 13.470 1.00 0.00 C
ATOM 4046 HA1 GLY 253 27.260 15.570 14.210 1.00 0.00 H
ATOM 4047 HA2 GLY 253 27.870 15.300 12.560 1.00 0.00 H
ATOM 4048 C GLY 253 26.290 13.980 13.160 1.00 0.00 C
ATOM 4049 O GLY 253 25.160 14.530 13.260 1.00 0.00 O
ATOM 4050 N NME 254 26.410 12.580 12.730 1.00 0.00 N
ATOM 4051 H NME 254 27.290 12.040 12.620 1.00 0.00 H
ATOM 4052 CH3 NME 254 25.070 12.090 12.510 1.00 0.00 C
ATOM 4053 1HH3 NME 254 24.360 12.670 13.090 1.00 0.00 H
ATOM 4054 2HH3 NME 254 24.810 12.170 11.460 1.00 0.00 H
ATOM 4055 3HH3 NME 254 24.990 11.050 12.810 1.00 0.00 H
ATOM 4056 1HH3 ACE 255 18.050 15.000 19.990 1.00 0.00 H
ATOM 4057 CH3 ACE 255 17.520 15.150 19.060 1.00 0.00 C
ATOM 4058 2HH3 ACE 255 16.640 14.520 19.070 1.00 0.00 H
ATOM 4059 3HH3 ACE 255 18.160 14.840 18.240 1.00 0.00 H
ATOM 4060 C ACE 255 17.130 16.600 18.900 1.00 0.00 C
ATOM 4061 O ACE 255 17.990 17.490 18.870 1.00 0.00 O
ATOM 4062 N CYS 256 10.990 17.400 19.550 1.00 0.00 N
ATOM 4063 H CYS 256 11.770 17.930 19.900 1.00 0.00 H
ATOM 4064 CA CYS 256 10.940 15.930 19.630 1.00 0.00 C
ATOM 4065 HA CYS 256 10.900 15.510 18.630 1.00 0.00 H
ATOM 4066 CB CYS 256 12.240 15.380 20.240 1.00 0.00 C
ATOM 4067 HB1 CYS 256 13.090 15.710 19.650 1.00 0.00 H
ATOM 4068 HB2 CYS 256 12.200 14.290 20.240 1.00 0.00 H
ATOM 4069 SG CYS 256 12.600 15.850 21.950 1.00 0.00 S
ATOM 4070 C CYS 256 9.750 15.460 20.460 1.00 0.00 C
ATOM 4071 O CYS 256 9.320 16.130 21.410 1.00 0.00 O
and the final terminus:
ATOM 4668 HB2 TYR 294 15.840 2.470 -1.460 1.00 0.00 H
ATOM 4669 CG TYR 294 14.480 3.790 -2.480 1.00 0.00 C
ATOM 4670 CD1 TYR 294 14.920 5.050 -2.010 1.00 0.00 C
ATOM 4671 HD1 TYR 294 15.940 5.180 -1.690 1.00 0.00 H
ATOM 4672 CE1 TYR 294 14.010 6.130 -1.970 1.00 0.00 C
ATOM 4673 HE1 TYR 294 14.320 7.090 -1.600 1.00 0.00 H
ATOM 4674 CZ TYR 294 12.680 5.950 -2.430 1.00 0.00 C
ATOM 4675 OH TYR 294 11.810 7.000 -2.420 1.00 0.00 O
ATOM 4676 HH TYR 294 11.610 7.260 -3.320 1.00 0.00 H
ATOM 4677 CE2 TYR 294 12.250 4.680 -2.910 1.00 0.00 C
ATOM 4678 HE2 TYR 294 11.240 4.540 -3.270 1.00 0.00 H
ATOM 4679 CD2 TYR 294 13.150 3.600 -2.900 1.00 0.00 C
ATOM 4680 HD2 TYR 294 12.810 2.620 -3.220 1.00 0.00 H
ATOM 4681 C TYR 294 17.490 1.640 -3.410 1.00 0.00 C
ATOM 4682 O TYR 294 18.140 1.470 -2.360 1.00 0.00 O
ATOM 4683 N NME 295 17.640 0.730 -4.550 1.00 0.00 N
ATOM 4684 H NME 295 17.150 0.790 -5.450 1.00 0.00 H
ATOM 4685 CH3 NME 295 18.610 -0.280 -4.160 1.00 0.00 C
ATOM 4686 1HH3 NME 295 19.260 0.110 -3.390 1.00 0.00 H
ATOM 4687 2HH3 NME 295 18.110 -1.160 -3.780 1.00 0.00 H
ATOM 4688 3HH3 NME 295 19.210 -0.560 -5.010 1.00 0.00 H
There are no CONECT
records, which I believe the PDB format requires.
The PDB file should also have TER
records if there are chain breaks.
I notice the PDB file provided by Schrödinger with the [JACS paper SI](https://drive.google.com/folderview?id=0BylmDElgu6QLTnJ2WGMzNXBENkk&usp = JACS-thrombin.zip sharing) is different and simply omits residues (leaves a gap) for chain breaks, e.g.:
ATOM 2460 N THR H 147 7.068 -18.844 17.812 1.00 57.28 N
ANISOU 2460 N THR H 147 9357 4126 8279 4136 -2622 -2172
ATOM 2461 CA THR H 147 7.021 -20.068 16.993 1.00 65.30 C
ANISOU 2461 CA THR H 147 11065 4901 8844 4894 -3828 -2926
ATOM 2462 C THR H 147 8.441 -20.605 16.832 1.00 67.76 C
ANISOU 2462 C THR H 147 11712 4844 9190 5513 -2973 -2653
ATOM 2463 O THR H 147 8.767 -21.337 15.884 1.00 69.18 O
ANISOU 2463 O THR H 147 11766 6235 8284 2678 213 -2106
ATOM 2464 CB THR H 147 6.350 -19.878 15.620 1.00 71.40 C
ANISOU 2464 CB THR H 147 12410 5945 8774 4762 -4052 -2656
ATOM 2465 OG1 THR H 147 5.041 -19.283 15.729 1.00 74.77 O
ANISOU 2465 OG1 THR H 147 12661 7837 7913 5566 -3652 -221
ATOM 2466 CG2 THR H 147 6.108 -21.225 14.947 1.00 78.51 C
ANISOU 2466 CG2 THR H 147 12765 7637 9426 2825 -3437 -3931
ATOM 2467 H THR H 147 7.859 -18.223 17.722 1.00 0.00 H
ATOM 2468 HA THR H 147 6.340 -20.786 17.450 1.00 0.00 H
ATOM 2469 HXT THR H 147 9.172 -20.328 17.577 1.00 0.00 H
ATOM 2470 HB THR H 147 6.935 -19.179 15.023 1.00 0.00 H
ATOM 2471 HG1 THR H 147 5.114 -18.421 16.146 1.00 0.00 H
ATOM 2472 HG21 THR H 147 5.633 -21.068 13.978 1.00 0.00 H
ATOM 2473 HG22 THR H 147 7.060 -21.737 14.806 1.00 0.00 H
ATOM 2474 HG23 THR H 147 5.458 -21.834 15.575 1.00 0.00 H
ATOM 2475 N GLY H 150 0.878 -15.170 13.641 1.00 57.65 N
ANISOU 2475 N GLY H 150 9021 4965 7919 93 -1279 -89
ATOM 2476 CA GLY H 150 1.619 -15.138 12.383 1.00 44.28 C
ANISOU 2476 CA GLY H 150 7719 2585 6519 -176 -2798 -1772
ATOM 2477 C GLY H 150 2.444 -13.872 12.211 1.00 39.70 C
ANISOU 2477 C GLY H 150 7162 2257 5664 344 -3456 -917
ATOM 2478 O GLY H 150 3.130 -13.412 13.134 1.00 34.20 O
ANISOU 2478 O GLY H 150 4866 3446 4682 70 -1786 -1281
ATOM 2479 H1 GLY H 150 0.351 -16.030 13.700 1.00 0.00 H
ATOM 2480 H2 GLY H 150 1.525 -15.118 14.415 1.00 0.00 H
ATOM 2481 HA2 GLY H 150 2.302 -15.986 12.342 1.00 0.00 H
ATOM 2482 HA3 GLY H 150 0.921 -15.194 11.548 1.00 0.00 H
Uncapped breaks like this would almost certainly cause a problem for OpenMM.
Use examples from #919
We want to be able to deal with protein caps for proteins prepared with the Maestro software or the suite's protein preparation wizard. This will require to check what are the possible caps these use and make sure we are able to deal with them. (ex.
NMA
instead ofNME
).