Recently I'm trying using enzy_htp.preparation.validity.is_structure_valid to split out invalid PDB files, however, I found that even if I read a structure from a plaintext file as follows and apply is_structure_valid to this structure instance, its first return value is True, meaning its a valid PDB, which is ridiculous.
I'm not a PDB file.
The same happens if I maliciously crop the PDB file, cutting off a section of the PDB text halfway through (an) amino acid residue(s) as follows, and the method still returns True as its first return value.
ATOM 112 N CYS A 16 -19.121 57.471 -4.382 1.00 32.64 N
ATOM 113 CA CYS A 16 -17.751 57.622 -4.850 1.00 31.03 C
ATOM 114 C CYS A 16 -17.504 56.987 -6.204 1.00 30.96 C
ATOM 115 O CYS A 16 -16.387 57.090 -6.720 1.00 33.46 O
ATOM 116 CB CYS A 16 -16.782 57.030 -3.833 1.00 30.14 C
ATOM 117 SG CYS A 16 -16.999 57.656 -2.189 1.00 37.29 S
ATOM 118 N MET A 17 -18.506 56.361 -6.808 1.00 27.80 N
ATOM 119 CA MET A 17 -18.295 55.628 -8.043 1.00 32.36 C
ATOM 120 C MET A 17 -18.443 56.566 -9.232 1.00 33.27 C
ATOM 121 O MET A 17 -19.382 57.370 -9.287 1.00 34.34 O
----------------------------------------------------------------------------------
ATOM 122 CB MET A 17 -19.263 54.450 -8.155 1.00 24.25 C
ATOM 123 CG MET A 17 -19.130 53.445 -6.990 1.00 28.22 C
ATOM 124 SD MET A 17 -17.442 52.890 -6.670 1.00 31.88 S
ATOM 125 CE MET A 17 -17.029 52.164 -8.257 1.00 25.19 C
ATOM 126 N VAL A 18 -17.499 56.476 -10.166 1.00 25.67 N
ATOM 127 CA VAL A 18 -17.519 57.276 -11.379 1.00 30.98 C
ATOM 128 C VAL A 18 -17.184 56.391 -12.571 1.00 32.33 C
ATOM 129 O VAL A 18 -16.726 55.253 -12.438 1.00 33.22 O
ATOM 130 CB VAL A 18 -16.547 58.472 -11.319 1.00 33.30 C
ATOM 131 CG1 VAL A 18 -16.893 59.386 -10.162 1.00 35.06 C
ATOM 132 CG2 VAL A 18 -15.101 57.976 -11.233 1.00 27.97 C
ATOM 133 N GLN A 19 -17.423 56.941 -13.748 1.00 30.71 N
ATOM 134 CA GLN A 19 -17.140 56.269 -14.998 1.00 36.89 C
ATOM 135 C GLN A 19 -15.872 56.866 -15.590 1.00 32.93 C
ATOM 136 O GLN A 19 -15.702 58.090 -15.594 1.00 39.82 O
----------------------------------------------------------------------------------
ATOM 137 CB GLN A 19 -18.312 56.423 -15.956 1.00 35.41 C
ATOM 138 CG GLN A 19 -18.016 55.927 -17.331 1.00 47.83 C
ATOM 139 CD GLN A 19 -19.161 56.172 -18.286 1.00 55.35 C
ATOM 140 NE2 GLN A 19 -19.711 57.381 -18.254 1.00 61.29 N
ATOM 141 OE1 GLN A 19 -19.543 55.286 -19.051 1.00 58.01 O
ATOM 142 N VAL A 20 -14.966 56.005 -16.037 1.00 35.63 N
ATOM 143 CA VAL A 20 -13.690 56.437 -16.598 1.00 34.18 C
ATOM 144 C VAL A 20 -13.588 55.865 -17.997 1.00 37.96 C
ATOM 145 O VAL A 20 -13.718 54.651 -18.192 1.00 37.23 O
ATOM 146 CB VAL A 20 -12.490 56.006 -15.744 1.00 32.52 C
ATOM 147 CG1 VAL A 20 -11.182 56.433 -16.441 1.00 34.33 C
ATOM 148 CG2 VAL A 20 -12.576 56.632 -14.371 1.00 26.90 C
For the first issue, i.e., the "plaintext" one, maybe it can be fixed by checking the atom numbers of the structure and return False if stru.num_atoms == 0.
For the second issue, could you please do something to make it work more properly? Thanks a lot!
Dear colleagues,
Recently I'm trying using
enzy_htp.preparation.validity.is_structure_valid
to split out invalid PDB files, however, I found that even if I read a structure from a plaintext file as follows and applyis_structure_valid
to this structure instance, its first return value isTrue
, meaning its a valid PDB, which is ridiculous.The same happens if I maliciously crop the PDB file, cutting off a section of the PDB text halfway through (an) amino acid residue(s) as follows, and the method still returns
True
as its first return value.For the first issue, i.e., the "plaintext" one, maybe it can be fixed by checking the atom numbers of the structure and return
False
ifstru.num_atoms == 0
.For the second issue, could you please do something to make it work more properly? Thanks a lot!
Best, Zhong.