THGLab / MCSCE

Monte Carlo Side Chain Entropy package for generating side chain packing for fixed protein backbone
MIT License
3 stars 2 forks source link

Amine protons #25

Open joaomcteixeira opened 2 years ago

joaomcteixeira commented 2 years ago

I have a PDB that has HN instead of H for the amine protons. It fails when getting the data from the forcefield.

ATOM      3  N   MET     1     -39.879  34.489 108.809                       N
ATOM      4  CA  MET     1     -39.751  34.162 107.398                       C
ATOM      5  C   MET     1     -38.320  34.441 106.951                       C
ATOM      6  O   MET     1     -37.601  35.249 107.556                       O
ATOM      7  CB  MET     1     -40.710  35.030 106.571                       C
ATOM      8  HN  MET     1     -39.314  35.243 109.200                       H
ATOM      9  HA  MET     1     -39.978  33.088 107.235                       H
ATOM     10  N   VAL     2     -37.887  33.772 105.882                       N
ATOM     11  CA  VAL     2     -36.540  33.956 105.365                       C
ATOM     12  C   VAL     2     -36.596  34.016 103.843                       C
ATOM     13  O   VAL     2     -37.179  33.143 103.183                       O
ATOM     14  CB  VAL     2     -35.652  32.778 105.787                       C
ATOM     15  HN  VAL     2     -38.515  33.116 105.415                       H
ATOM     16  HA  VAL     2     -36.115  34.907 105.749                       H

gets the error:

Start preparing energy calculators at different sidechain completion levels
Traceback (most recent call last):
  File "/home/joao/anaconda3/envs/mcsce/bin/mcsce", line 33, in <module>
    sys.exit(load_entry_point('mcsce', 'console_scripts', 'mcsce')())
  File "/home/joao/github/MCSCE/src/mcsce/cli.py", line 36, in maincli
    cli(parser, main)
  File "/home/joao/github/MCSCE/src/mcsce/cli.py", line 31, in cli
    main(**vars(cmd))
  File "/home/joao/github/MCSCE/src/mcsce/cli.py", line 142, in main
    forcefield=ff_obj, terms=["lj", "clash"]), structure=s)
  File "/home/joao/github/MCSCE/src/mcsce/core/side_chain_builder.py", line 77, in initialize_func_calc
    structure.res_labels))
  File "/home/joao/github/MCSCE/src/mcsce/libs/libenergy.py", line 83, in prepare_energy_function
    base_bool=False,
  File "/home/joao/github/MCSCE/src/mcsce/libs/libenergy.py", line 324, in create_bonds_apart_mask_for_ij_pairs
    if j_atom_name in bonds_intra[res_label][i_atom_name]:
KeyError: 'HN'

Do you think it is possible to add an option also for these atoms, for example repeating the info, or is it best the user modifies the atom name beforehand?

joaomcteixeira commented 2 years ago

However, when I rename the atoms HN to H It fails with the following:

Now working on 1110a_111.pdb
WARNING! These atoms are missing from the current backbone structure [1110a_111.pdb]:
1 H1
1 H2
1 H3
111 OXT

Start preparing energy calculators at different sidechain completion levels
Traceback (most recent call last):
  File "/home/joao/anaconda3/envs/mcsce/bin/mcsce", line 33, in <module>
    sys.exit(load_entry_point('mcsce', 'console_scripts', 'mcsce')())
  File "/home/joao/github/MCSCE/src/mcsce/cli.py", line 36, in maincli
    cli(parser, main)
  File "/home/joao/github/MCSCE/src/mcsce/cli.py", line 31, in cli
    main(**vars(cmd))
  File "/home/joao/github/MCSCE/src/mcsce/cli.py", line 142, in main
    forcefield=ff_obj, terms=["lj", "clash"]), structure=s)
  File "/home/joao/github/MCSCE/src/mcsce/core/side_chain_builder.py", line 77, in initialize_func_calc
    structure.res_labels))
  File "/home/joao/github/MCSCE/src/mcsce/libs/libenergy.py", line 83, in prepare_energy_function
    base_bool=False,
  File "/home/joao/github/MCSCE/src/mcsce/libs/libenergy.py", line 324, in create_bonds_apart_mask_for_ij_pairs
    if j_atom_name in bonds_intra[res_label][i_atom_name]:
KeyError: 'H'
joaomcteixeira commented 2 years ago

I am pasting here the PDB, just 10 residues is enough to reproduce the error. Do you think MCSCE can be more versatile on this matter? Think that PDBs will have millions of forms :-x

ATOM      3  N   MET     1     -39.879  34.489 108.809                       N
ATOM      4  CA  MET     1     -39.751  34.162 107.398                       C
ATOM      5  C   MET     1     -38.320  34.441 106.951                       C
ATOM      6  O   MET     1     -37.601  35.249 107.556                       O
ATOM      7  CB  MET     1     -40.710  35.030 106.571                       C
ATOM      8  H   MET     1     -39.314  35.243 109.200                       H
ATOM      9  HA  MET     1     -39.978  33.088 107.235                       H
ATOM     10  N   VAL     2     -37.887  33.772 105.882                       N
ATOM     11  CA  VAL     2     -36.540  33.956 105.365                       C
ATOM     12  C   VAL     2     -36.596  34.016 103.843                       C
ATOM     13  O   VAL     2     -37.179  33.143 103.183                       O
ATOM     14  CB  VAL     2     -35.652  32.778 105.787                       C
ATOM     15  H   VAL     2     -38.515  33.116 105.415                       H
ATOM     16  HA  VAL     2     -36.115  34.907 105.749                       H
ATOM     17  N   ARG     3     -35.987  35.051 103.263                       N
ATOM     18  CA  ARG     3     -35.976  35.212 101.817                       C
ATOM     19  C   ARG     3     -34.657  34.678 101.269                       C
ATOM     20  O   ARG     3     -34.479  34.523 100.051                       O
ATOM     21  CB  ARG     3     -36.102  36.697 101.454                       C
ATOM     22  H   ARG     3     -35.520  35.746 103.845                       H
ATOM     23  HA  ARG     3     -36.812  34.639 101.365                       H
ATOM     24  N   THR     4     -33.711  34.390 102.164                       N
ATOM     25  CA  THR     4     -32.413  33.874 101.757                       C
ATOM     26  C   THR     4     -32.115  32.603 102.544                       C
ATOM     27  O   THR     4     -32.578  32.423 103.680                       O
ATOM     28  CB  THR     4     -31.322  34.912 102.052                       C
ATOM     29  H   THR     4     -33.901  34.534 103.156                       H
ATOM     30  HA  THR     4     -32.424  33.634 100.673                       H
ATOM     31  N   LYS     5     -31.333  31.701 101.948                       N
ATOM     32  CA  LYS     5     -30.982  30.451 102.603                       C
ATOM     33  C   LYS     5     -29.533  30.526 103.073                       C
ATOM     34  O   LYS     5     -29.048  29.658 103.813                       O
ATOM     35  CB  LYS     5     -31.127  29.284 101.617                       C
ATOM     36  H   LYS     5     -30.974  31.891 101.012                       H
ATOM     37  HA  LYS     5     -31.639  30.286 103.483                       H
ATOM     38  N   ALA     6     -28.821  31.571 102.647                       N
ATOM     39  CA  ALA     6     -27.429  31.747 103.031                       C
ATOM     40  C   ALA     6     -27.320  32.947 103.965                       C
ATOM     41  O   ALA     6     -27.693  34.076 103.612                       O
ATOM     42  CB  ALA     6     -26.571  32.004 101.785                       C
ATOM     43  H   ALA     6     -29.262  32.261 102.039                       H
ATOM     44  HA  ALA     6     -27.065  30.842 103.561                       H
ATOM     45  N   GLU     7     -26.805  32.719 105.174                       N
ATOM     46  CA  GLU     7     -26.653  33.788 106.149                       C
ATOM     47  C   GLU     7     -25.189  34.209 106.198                       C
ATOM     48  O   GLU     7     -24.335  33.512 106.766                       O
ATOM     49  CB  GLU     7     -27.078  33.294 107.538                       C
ATOM     50  H   GLU     7     -26.512  31.773 105.421                       H
ATOM     51  HA  GLU     7     -27.271  34.661 105.852                       H
ATOM     52  N   SER     8     -24.878  35.361 105.601                       N
ATOM     53  CA  SER     8     -23.513  35.863 105.585                       C
ATOM     54  C   SER     8     -23.136  36.323 106.989                       C
ATOM     55  O   SER     8     -23.886  37.057 107.650                       O
ATOM     56  CB  SER     8     -23.405  37.053 104.623                       C
ATOM     57  H   SER     8     -25.613  35.901 105.146                       H
ATOM     58  HA  SER     8     -22.817  35.057 105.270                       H
ATOM     59  N   ILE     9     -21.966  35.896 107.464                       N
ATOM     60  CA  ILE     9     -21.503  36.271 108.791                       C
ATOM     61  C   ILE     9     -20.801  37.622 108.708                       C
ATOM     62  O   ILE     9     -20.469  38.112 107.619                       O
ATOM     63  CB  ILE     9     -20.511  35.224 109.315                       C
ATOM     64  H   ILE     9     -21.382  35.293 106.884                       H
ATOM     65  HA  ILE     9     -22.366  36.354 109.484                       H
ATOM     66  N   PRO    10     -20.566  38.245 109.864                       N
ATOM     67  CA  PRO    10     -19.902  39.539 109.908                       C
ATOM     68  C   PRO    10     -18.449  39.366 109.481                       C
ATOM     69  O   PRO    10     -17.778  40.326 109.072                       O
ATOM     70  CB  PRO    10     -19.940  40.098 111.336                       C
ATOM     71  H   PRO    10     -20.857  37.804 110.737                       H
ATOM     72  HA  PRO    10     -20.403  40.244 109.212                       H
ATOM     73  N   GLY    11     -17.942  38.136 109.569                       N
ATOM     74  CA  GLY    11     -16.567  37.853 109.190                       C
ATOM     75  C   GLY    11     -16.469  37.816 107.668                       C
ATOM     76  O   GLY    11     -15.444  38.186 107.078                       O
ATOM     77  HA1 GLY    11     -16.139  36.490 109.751                       H
ATOM     78  H   GLY    11     -18.532  37.377 109.911                       H
ATOM     79  HA2 GLY    11     -15.896  38.648 109.576                       H
ATOM     80  N   THR    12     -17.540  37.367 107.012                       N
ATOM     81  CA  THR    12     -17.561  37.287 105.560                       C
ATOM     82  C   THR    12     -17.986  38.638 104.995                       C
ATOM     83  O   THR    12     -18.641  39.444 105.672                       O
ATOM     84  CB  THR    12     -18.565  36.218 105.109                       C
ATOM     85  H   THR    12     -18.362  37.072 107.539                       H
ATOM     86  HA  THR    12     -16.548  37.039 105.179                       H
ATOM     87  N   LYS    13     -17.619  38.903 103.740                       N
ATOM     88  CA  LYS    13     -17.968  40.160 103.097                       C
ATOM     89  C   LYS    13     -18.669  39.864 101.776                       C
ATOM     90  O   LYS    13     -18.106  40.059 100.688                       O
ATOM     91  CB  LYS    13     -16.699  40.976 102.818                       C
ATOM     92  H   LYS    13     -17.081  38.210 103.220                       H
ATOM     93  HA  LYS    13     -18.655  40.741 103.748                       H
JerryJohnsonLee commented 2 years ago

In this case the issue is that the proline residue does not have amide proton, but the input structure has amide hydrogen for proline (residue 10), so MCSCE does not know what force field parameter to use for that atom. Maybe I should also provide a warning/error message in the beginning.

As for the HN atom name, it is not a standard in PDB format and therefore there is no corresponding records in the force field parameter file either. I think the users should modify the atom names, and I can provide a clearer error message for this as well.