DeepRank / pdb2sql

Fast and versatile biomolecular structure PDB file parser using SQL queries
https://pdb2sql.readthedocs.io
Apache License 2.0
24 stars 12 forks source link

compute_lrmsd_pdb2sql works only if there is no missing backbone atom #85

Open FarzanehParizi opened 2 years ago

FarzanehParizi commented 2 years ago

Describe the bug If a backbone atom is missing in the Ligand part of one of the two PDBs, _compute_lrmsdpdb2sql does not report it and leads to an error

Environment:

To Reproduce

  1. Input these two PDBs:

    BL00190001_decoy.txt BL00190001_ref.txt

sim = StructureSimilarity(decoy_path, ref_path) lrmsd = sim.compute_lrmsd_pdb2sql(exportpath=None, method='svd')

Expected Results calculates the LRMSD value even if one (or more) of the backbone atoms is missing or prints a proper error message to report the mismatched backbone atom(s)

Actual Results or Error Info

624         # compute the RMSD
625         lrmsd = self.get_rmsd(xyz_decoy_short, xyz_ref_short)
626 
627         # export the pdb for verifiactions

             ..../pdb2sql/pdb2sql/StructureSimilarity.py in get_rmsd(P, Q)
1280         """
1281         n = len(P)
1282         return round(np.sqrt(1. / n * np.sum((P - Q)**2)), 3)

Additional Context The compute_lrmsd_fast does not have this problem and prints the backbone LRMSD value

NicoRenaud commented 2 years ago

Hey @FarzanehParizi thanks for testing the code. Indeed if the residue were matching but not the atoms in each residues the code was failing. I've fixed that and tested the code on the two pdbs you linked and it seems to be working. Could you double check ?

NicoRenaud commented 2 years ago

Code in #82

FarzanehParizi commented 2 years ago

Thanks @NicoRenaud for the check . With this new fix this new error message is then printed:

rmsd = sim.compute_lrmsd_pdb2sql(exportpath=None, method='svd') File "..../pdb2sql/pdb2sql/StructureSimilarity.py", line 597, in compute_lrmsd_pdb2sql if self.check_residues() is False: File "..../pdb2sql/StructureSimilarity.py", line 100, in check_residues raise ValueError( ValueError: Atoms not identical in ref and decoy. Set enforce_residue_matching=False to bypass this error.

But the residue numbering is identical in two PDBs, only one PDB has two extra atoms for one of the residues.

it only works if setting _enforce_residuematching to False, and then prints the Warning message:

.../pdb2sql/StructureSimilarity.py:103: UserWarning: Atoms not identical in ref and decoy. warnings.warn('Atoms not identical in ref and decoy.')

NicoRenaud commented 2 years ago

Yes that would be the expected behavior no ? We can refine the error message so that the exact atom/residue is printed as well if you think that's better

FarzanehParizi commented 2 years ago

I am puzzled, should not pdbsql ignore missing residues/atoms? So if there is a missing atom in one PDB ignore that also for the other PDB? Maybe I am wrong in this case. @LilySnow what is your opinion?

NicoRenaud commented 2 years ago

@FarzanehParizi I think that's what happening here. There is a step where we extract the common atoms of both PDB and use only those to compute the rmsd

FarzanehParizi commented 2 years ago

So then it should give an RMSD value for this case, not an error. Even if I use only 'CA' LRMSD, it still leads to error (the difference in two PDBs is only one of them has a residue which has two extra atoms: O and OXT ... not "CA")

NicoRenaud commented 2 years ago

I've changed the code so that we use the filter specified when calling compute_lrmsd (e.g. name=['CA']) when checking if the residues are matching. Let me know if that works out for you. I think we should refactor the StructureSimilarity cause it is now a big mess. We should probably sit down for a bit and define the requirement and API and rewrite it from scratch

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.