DeepRank / pdb2sql

Fast and versatile biomolecular structure PDB file parser using SQL queries
https://pdb2sql.readthedocs.io
Apache License 2.0
24 stars 12 forks source link

compute_irmsd_fast() and compute_irmsd_pdb2sql() give very different RMSD #77

Closed LilySnow closed 2 years ago

LilySnow commented 2 years ago

Describe the bug compute_irmsd_fast() and compute_irmsd_pdb2sql() give very different RMSD: 1.13 and 5.742, respectively.

To Reproduce

from pdb2sql.StructureSimilarity import StructureSimilarity
sim = StructureSimilarity('model.pdb', 'ref.pdb')
sim.compute_irmsd_fast() # this gives an irmsd of 1.13
sim.compute_irmsd_pdb2sql() #this gives an irmsd of 5.742

Input files: test.tar.gz

Maybe it is related with this full request: https://github.com/DeepRank/pdb2sql/pull/72

LilySnow commented 2 years ago

It turns out it is not a bug of pdb2sql. It is caused by PDB files. There are duplicated residues (same chain IDs, residue names and residue numbers) with different x,y,z in the PDB file.