Getting different results for irmsd, if the chain identifiers are in different order

Smarti92 commented 2 years ago

Hi, I have different pdb files from docking tools and I want to calculate the irmsd. Some of the files have a different order for the chain identifier, but the first chain is always the protein and the second the ligand. One file has A, B as chain identifier and the other one B, A. So the order of the chains is the same, but not the identifier. The value of chains_decoy and chains_ref are checked to see, if the chains in the structures are different, but the funtion get_chains() returns the chain IDs in alphabetical order.

# get the chains chains_decoy = sql_decoy.get_chains() chains_ref = sql_ref.get_chains()

if chains_decoy != chains_ref: raise ValueError('Chains are different in decoy and reference structure')

If the chain identifer would be A, B and X, Y, it would raise an error. But because A, B is also not the same like B, A, I normally would expect it to raise an error. testdata.zip

Is there a possibility to integrate this in StructureSimilarity.py or do I have to rename the chain identifiers before I calculate the irmsd?

CunliangGeng commented 2 years ago

Hi @Smarti92, the chains are recognised by ID but not relative order, it's necessary to preprocess clean PDB files to make them consistent with each other before using pdb2sql. For processing PDB files, you could try the pdb-tools or its service.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

DeepRank / pdb2sql

Getting different results for irmsd, if the chain identifiers are in different order #73