MolBIFFM / PTGLtools

The Visualization of Protein-Ligand Graphs software that powers the PTGL
http://www.bioinformatik.uni-frankfurt.de/tools/vplg/
Other
15 stars 1 forks source link

Strange SSE ending number #12

Closed timn2008 closed 3 years ago

timn2008 commented 3 years ago

I'm running VPLG with coils enabled (plcc_B_include_coils=true in ~/plcc_settings.txt) for PDB 1cgl and in 1cgl.cif_B_albelig_coils_PG.plg I get the following lines

| ../pdbbind_cif_split/1cgl.cif | B | albelig | 6 | 6 | E | 211 | 213 | B-148-  | B-150-  | TFT
| ../pdbbind_cif_split/1cgl.cif | B | albelig | 7 | 7 | C | 214 | **331** | B-151-  | B-152-  | KVVQAD
| ../pdbbind_cif_split/1cgl.cif | B | albelig | 8 | 8 | E | 219 | 224 | B-159-  | B-164-  | IMISFV

which seems strange due to SSE ending id 331, given that the next SSE starts with 219. The corresponding part of DSSP file is:

  212  149 B F  E     -f  177   0C  28     -2,-0.3     2,-0.4   -36,-0.2   -34,-0.2  -0.951   0.9-161.1-123.5 141.0   25.7   64.9  -2$
  213  150 B T  E      f  178   0C  65    -36,-1.6   -34,-1.9    -2,-0.4    -2,-0.0  -0.978 360.0 360.0-129.4 122.2   23.7   65.0  -2$
  214  151 B K              0   0  119     -2,-0.4   -34,-0.2   -36,-0.2   -36,-0.0  -0.520 360.0 360.0 -69.6 360.0   20.7   62.7  -2$
  215        !              0   0    0      0, 0.0     0, 0.0     0, 0.0     0, 0.0   0.000 360.0 360.0 360.0 360.0    0.0    0.0    $
  216  156 B Q              0   0  195      1,-0.1     2,-0.3   -38,-0.1   -37,-0.0   0.000 360.0 360.0 360.0 148.0   26.6   57.3  -3$
  217  157 B A        -     0   0   15      1,-0.1   -38,-0.2   -37,-0.1     3,-0.1  -0.861 360.0-133.6-142.5 163.4   26.9   57.9  -2$
  218  158 B D  S    S+     0   0   20    -40,-3.5     2,-0.6    -2,-0.3   -39,-0.2   0.806  96.7  47.2 -87.1 -39.0   29.3   57.1  -2$
  219  159 B I  E     -g  179   0D   0    -41,-1.9   -39,-2.8    35,-0.1     2,-0.5  -0.938  68.7-169.4-111.7 115.7   26.7   55.8  -2$
  220  160 B M  E     -g  180   0D  37     -2,-0.6    35,-1.8   -41,-0.2     2,-0.4  -0.958   7.3-167.5-105.2 118.8   24.1   53.3  -2$
  221  161 B I  E     +gh 181 255D   0    -41,-3.0   -39,-2.7    -2,-0.5   -38,-0.4  -0.825  15.1 158.2-107.0 141.0   21.2   52.6  -2$
  222  162 B S  E     - h   0 256D  10     33,-1.7    35,-2.1    -2,-0.4     2,-0.6  -0.968  38.4-125.0-163.8 147.5   18.8   49.8  -2$
  223  163 B F  E     + h   0 257D   8     -2,-0.3     2,-0.3    33,-0.2    35,-0.2  -0.847  42.2 160.3 -97.0 122.2   16.3   47.6  -2$
  224  164 B V  E     - h   0 258D  29     33,-2.0    35,-3.0    -2,-0.6    36,-0.4  -0.877  26.0-146.0-135.6 169.5   17.0   43.9  -2$
  225  165 B R        -     0   0  174     -2,-0.3    33,-0.1    33,-0.2    10,-0.0  -0.991  57.5 -22.6-139.6 144.2   16.1   40.6  -1$

Is this an expected behaviour ?

JNWolf commented 3 years ago

Hi timn2008, thank you for your detailed message which enabled me to reproduce the behavior. The line with DSSP ID 215 shows a chain break and a jump from residue 151 to 156. However, the PDB file (cif in my case) contains a residue in this place:

ATOM   2068 N  N      . LYS B 1 51  ? 21.570 63.818 -25.663 1.00 36.95 ? 151 LYS B N      1 
[...]
ATOM   2080 H  HZ3    . LYS B 1 51  ? 18.070 57.775 -27.207 1.00 0.00  ? 151 LYS B HZ3    1 
ATOM   2081 N  N      . VAL B 1 52  ? 20.774 61.487 -28.178 1.00 30.37 ? ***152 VAL*** B N      1 
[...]
ATOM   2087 H  H      . VAL B 1 52  ? 20.972 60.732 -27.587 1.00 0.00  ? 152 VAL B H      1 
ATOM   2088 N  N      . GLN B 1 56  ? 25.723 58.089 -33.572 1.00 54.52 ? 156 GLN B N      1 
[...]
ATOM   2098 H  HE22   . GLN B 1 56  ? 30.936 57.615 -30.566 1.00 0.00  ? 156 GLN B HE22   1

Currently, our residue list is created while parsing the DSSP file and later filled while parsing the PDB file. VAL152 is not found in the DSSP file but in the PDB file and therefore assigned a 'fake DSSP ID' the same as ligands, for example. Later, when getting the end DSSP ID of the SSE, the function returns the highest DSSP ID of all residues of the SSE. This is fine in most cases but results in the misleading 'fake DSSP ID' of 331. That DSSP ignores the last residue before a chain break is a common issue I think. There are several possibilities for us to bypass the problem, e.g., return the DSSP ID of the last residue instead of highest DSSP ID, ignore these residues missing in DSSP as well or, requiring major re-implementations, move away from using DSSP IDs.

Concluding, this is not really wanted but expected behavior. Thank you for reporting this. It helps our discussion about how to treat residues, free amino acids, ligands, residues missing in DSSP and so on (@sonnta). If you want a work around for this issue, you could either of

I hope this helps. If you have any questions regarding our software, feel free to ask. Thanks again for your message.

Best wishes Niclas