ChemBioHTP / EnzyHTP

EnzyHTP is a python library that automates the complete life-cycle of enzyme modeling
https://enzyhtp-doc.readthedocs.io
Other
8 stars 1 forks source link

enzy_htp.preparation.validity.is_structure_valid is unable to split out non-PDB files. #177

Open SwordJack opened 2 months ago

SwordJack commented 2 months ago

Dear colleagues,

Recently I'm trying using enzy_htp.preparation.validity.is_structure_valid to split out invalid PDB files, however, I found that even if I read a structure from a plaintext file as follows and apply is_structure_valid to this structure instance, its first return value is True, meaning its a valid PDB, which is ridiculous.

I'm not a PDB file.

The same happens if I maliciously crop the PDB file, cutting off a section of the PDB text halfway through (an) amino acid residue(s) as follows, and the method still returns True as its first return value.

ATOM    112  N   CYS A  16     -19.121  57.471  -4.382  1.00 32.64           N  
ATOM    113  CA  CYS A  16     -17.751  57.622  -4.850  1.00 31.03           C  
ATOM    114  C   CYS A  16     -17.504  56.987  -6.204  1.00 30.96           C  
ATOM    115  O   CYS A  16     -16.387  57.090  -6.720  1.00 33.46           O  
ATOM    116  CB  CYS A  16     -16.782  57.030  -3.833  1.00 30.14           C  
ATOM    117  SG  CYS A  16     -16.999  57.656  -2.189  1.00 37.29           S  
ATOM    118  N   MET A  17     -18.506  56.361  -6.808  1.00 27.80           N  
ATOM    119  CA  MET A  17     -18.295  55.628  -8.043  1.00 32.36           C  
ATOM    120  C   MET A  17     -18.443  56.566  -9.232  1.00 33.27           C  
ATOM    121  O   MET A  17     -19.382  57.370  -9.287  1.00 34.34           O  
----------------------------------------------------------------------------------
ATOM    122  CB  MET A  17     -19.263  54.450  -8.155  1.00 24.25           C  
ATOM    123  CG  MET A  17     -19.130  53.445  -6.990  1.00 28.22           C  
ATOM    124  SD  MET A  17     -17.442  52.890  -6.670  1.00 31.88           S  
ATOM    125  CE  MET A  17     -17.029  52.164  -8.257  1.00 25.19           C  
ATOM    126  N   VAL A  18     -17.499  56.476 -10.166  1.00 25.67           N  
ATOM    127  CA  VAL A  18     -17.519  57.276 -11.379  1.00 30.98           C  
ATOM    128  C   VAL A  18     -17.184  56.391 -12.571  1.00 32.33           C  
ATOM    129  O   VAL A  18     -16.726  55.253 -12.438  1.00 33.22           O  
ATOM    130  CB  VAL A  18     -16.547  58.472 -11.319  1.00 33.30           C  
ATOM    131  CG1 VAL A  18     -16.893  59.386 -10.162  1.00 35.06           C  
ATOM    132  CG2 VAL A  18     -15.101  57.976 -11.233  1.00 27.97           C  
ATOM    133  N   GLN A  19     -17.423  56.941 -13.748  1.00 30.71           N  
ATOM    134  CA  GLN A  19     -17.140  56.269 -14.998  1.00 36.89           C  
ATOM    135  C   GLN A  19     -15.872  56.866 -15.590  1.00 32.93           C  
ATOM    136  O   GLN A  19     -15.702  58.090 -15.594  1.00 39.82           O  
----------------------------------------------------------------------------------
ATOM    137  CB  GLN A  19     -18.312  56.423 -15.956  1.00 35.41           C  
ATOM    138  CG  GLN A  19     -18.016  55.927 -17.331  1.00 47.83           C  
ATOM    139  CD  GLN A  19     -19.161  56.172 -18.286  1.00 55.35           C  
ATOM    140  NE2 GLN A  19     -19.711  57.381 -18.254  1.00 61.29           N  
ATOM    141  OE1 GLN A  19     -19.543  55.286 -19.051  1.00 58.01           O  
ATOM    142  N   VAL A  20     -14.966  56.005 -16.037  1.00 35.63           N  
ATOM    143  CA  VAL A  20     -13.690  56.437 -16.598  1.00 34.18           C  
ATOM    144  C   VAL A  20     -13.588  55.865 -17.997  1.00 37.96           C  
ATOM    145  O   VAL A  20     -13.718  54.651 -18.192  1.00 37.23           O  
ATOM    146  CB  VAL A  20     -12.490  56.006 -15.744  1.00 32.52           C  
ATOM    147  CG1 VAL A  20     -11.182  56.433 -16.441  1.00 34.33           C  
ATOM    148  CG2 VAL A  20     -12.576  56.632 -14.371  1.00 26.90           C  

For the first issue, i.e., the "plaintext" one, maybe it can be fixed by checking the atom numbers of the structure and return False if stru.num_atoms == 0.

For the second issue, could you please do something to make it work more properly? Thanks a lot!

Best, Zhong.