DeepRank / pdb2sql

Fast and versatile biomolecular structure PDB file parser using SQL queries
https://pdb2sql.readthedocs.io
Apache License 2.0
24 stars 12 forks source link

StructureSimilarity "Matrix don't have the same number of points" between two matched files #58

Closed DarioMarzella closed 4 years ago

DarioMarzella commented 4 years ago

Describe the bug When trying to run compute_lrmsd_fast() on two files (model and target structure), the following error is raised: ValueError: ("Matrix don't have the same number of points", (524, 3), (528, 3))

Environment:

To Reproduce Steps/commands to reproduce the behaviour:

  1. open the attached folder
  2. run python run.py

Expected Results It should compute the l-RMSD

Actual Results or Error Info it raises the following error: ValueError: ("Matrix don't have the same number of points", (524, 3), (528, 3))

Additional Context The model and ref file have been matched, thus only the common residues are present.

Attached folder: pdb2sql_issue_58.zip

CunliangGeng commented 4 years ago

The cause of the error is that ref.pdb have double occupancies, e.g.

ATOM    779  N  ATHR M  94     -21.816   9.229  30.233  0.50 38.91           N  
ATOM    780  N  BTHR M  94     -21.729   9.244  30.235  0.50 38.76           N  
ATOM    781  CA ATHR M  94     -21.296   8.442  29.129  0.50 38.41           C  
ATOM    782  CA BTHR M  94     -21.316   8.480  29.067  0.50 38.04           C  

Use the pdb tool pdb_delocc.py to preprocess ref.pdb, then the rmsd calculation should work well.

BTW, in the run.py, the lrmsd = sim.compute_lrmsd_fast(method='svd', name = atoms) does not accept name as a parameter, you should remove it.

NicoRenaud commented 4 years ago

well it does accept name on issue55 branch, That we should merge now :)

DarioMarzella commented 4 years ago

Selecting only one occupancy with https://github.com/LilySnow/PDB_related/pdb_selalt.py solved the problem. Thanks.