To remove the dependency on Modeller, I have been trying to implement the 'packedness' algorithm, using Biopython as my PDBparser. While comparing their behaviour, I found some structures show up as overlapping structures in CATH.
For example, the 5cdzA01 contains 3329 lines, presumably implying 3329 atoms (and also reported by Biopython ), whereas Modeller claims there is only 1669 atoms in the structure, roughly halving the number.
To summarise:
biopython
modeller
ratio
residue_count
217
217
1.0
Atom_count
3329
1669
1.99
nbpair_count
1,177,160
290,380
4.05
conclusion
with H-atoms
without H-atoms
By inspection, it seems 1669 best describe the structure and I suspect the structure has been deposited as an overlapping structure. Currently I am patching this by detecting atom/res ratio using a cutoff at 11 (each res should not contain more than 11 atom on average), and correct the count by dividing with 2.0 and 4.0 respectively. But I feel this should not be a permanent patch.
@sillitoe @nataliedawson @tonyelewis Any thoughts on possible causes?
BTW, the cutoff for non-bonding interaction is 15.0A, while that for bonding interaction is 3.5A.
UPDATE: After inspecting the Modeller-cleaned PDB 5cdzA01_mod, I found it was the hydrogen atoms causing the difference. Modeller cleaned up H-atom automatically while biopython does not.
To remove the dependency on Modeller, I have been trying to implement the 'packedness' algorithm, using Biopython as my PDBparser. While comparing their behaviour, I found some structures show up as overlapping structures in CATH.
For example, the 5cdzA01 contains 3329 lines, presumably implying 3329 atoms (and also reported by Biopython ), whereas Modeller claims there is only 1669 atoms in the structure, roughly halving the number. To summarise:
By inspection, it seems 1669 best describe the structure and I suspect the structure has been deposited as an overlapping structure. Currently I am patching this by detecting atom/res ratio using a cutoff at 11 (each res should not contain more than 11 atom on average), and correct the count by dividing with 2.0 and 4.0 respectively. But I feel this should not be a permanent patch.
@sillitoe @nataliedawson @tonyelewis Any thoughts on possible causes?
BTW, the cutoff for non-bonding interaction is 15.0A, while that for bonding interaction is 3.5A.
UPDATE: After inspecting the Modeller-cleaned PDB 5cdzA01_mod, I found it was the hydrogen atoms causing the difference. Modeller cleaned up H-atom automatically while biopython does not.