cmbi / hssp

Create DSSP and HSSP files
GNU General Public License v3.0
83 stars 17 forks source link

mkdssp calculates spurious phi/psi angles between residues straddling discarded residue #86

Open tonyelewis opened 7 years ago

tonyelewis commented 7 years ago

Consider this example (adapted from residues 217, 218 and 219 of chain A of PDB 3r8d):

ATOM      1  N   GLY A   1     -21.821  -6.460 -44.110  1.00 44.43           N  
ATOM      2  CA  GLY A   1     -21.457  -5.215 -44.720  1.00 45.72           C  
ATOM      3  C   GLY A   1     -22.653  -4.708 -45.462  1.00 48.29           C  
ATOM      4  O   GLY A   1     -23.453  -5.519 -45.862  1.00 48.18           O  
ATOM      5  CA  GLU A   2     -23.199  -2.283 -45.588  1.00 73.09           C  
ATOM      6  C   GLU A   2     -23.609  -2.790 -46.930  1.00 69.74           C  
ATOM      7  O   GLU A   2     -24.724  -2.577 -47.369  1.00 69.31           O  
ATOM      8  N   ASP A   3     -22.684  -3.493 -47.558  1.00 69.04           N  
ATOM      9  CA  ASP A   3     -22.757  -3.856 -48.957  1.00 68.20           C  
ATOM     10  C   ASP A   3     -23.396  -5.203 -49.237  1.00 66.39           C  
ATOM     11  O   ASP A   3     -23.803  -5.469 -50.343  1.00 66.88           O  
TER      12      ASP A   3                                                      
END                                                                             

The mkdssp output, stripped of headers and intermediate columns, looks like this:

  #  RESIDUE AA [... ...]  PHI   PSI    X-CA   Y-CA   Z-CA            CHAIN
    1    1 A G  [... ...] 360.0-144.6  -21.5   -5.2  -44.7               
    2    3 A D  [... ...]  24.3 360.0  -22.8   -3.9  -49.0               

From what I understand... mkdssp has reasonably discarded residue 2 for not having an N atom record. But then since the C atom of 1 happens to be close enough to the N atom of 3, the check here hasn't put a break between 1 and 3. As a result, it has later calculated and printed phi/psi angles between 1 and 3, which are spurious and possibly misleading values because we know the input data has a different residue in between them.

I think it would be better if the code for ignoring residues (here) were able to explicitly insert a chain break (if there isn't already one there). What do you think?

cbaakman commented 7 years ago

Sounds like a good solution to me. I'll discuss it with my collegues.