bjornwallner / DockQ

DockQ is a single continuous quality measure for protein docked models based on the CAPRI evaluation protocol
MIT License
185 stars 46 forks source link

inconsistent results #26

Closed luwei0917 closed 1 month ago

luwei0917 commented 2 months ago

Thanks for updating DockQ, and make it more user friendly. But I got different results by computing the DockQ of the example structure.

By doing this in the python script: from DockQ.DockQ import load_PDB, run_on_all_native_interfaces model = load_PDB("examples/1A2K_r_l_b.model.pdb") native = load_PDB("examples/1A2K_r_l_b.pdb") chain_map = {"A":"A", "B":"B"} run_on_all_native_interfaces(model, native, chain_map=chain_map)[0] , I got

{('A', 'B'): {'DockQ_F1': 0.4408235819178569, 'DockQ': 0.4350515761458511, 'F1': 0.38095238095238093, 'irms': 1.3939127448327777, 'Lrms': 10.304622346268628, 'fnat': 0.36363636363636365, 'nat_correct': 4, 'nat_total': 11, 'fnonnat': 0.6, 'nonnat_count': 6, 'model_total': 10, 'clashes': 1, 'len1': 198, 'len2': 106, 'class1': 'receptor', 'class2': 'ligand', 'chain1': 'A', 'chain2': 'B', 'chain_map': {'A': 'A', 'B': 'B'}}}

But if I run DockQ 1A2K_r_l_b.model.pdb 1A2K_r_l_b.pdb in terminal, I got: Model : 1A2K_r_l_b.model.pdb Native : 1A2K_r_l_b.pdb Total DockQ over 3 native interfaces: 0.653 with BAC:ABC model:native mapping Native chains: A, B Model chains: B, A DockQ: 0.994 irms: 0.000 Lrms: 0.000 fnat: 0.983 fnonnat: 0.008 clashes: 0.000 F1: 0.987 DockQ_F1: 0.996

DockQ scores are quite different. Do you know why? Thanks.

luwei0917 commented 2 months ago

Also, If I run the python script inside a notebook, the result will change if I run another 'run_on_all_native_interfaces' beforehand. I guess there is some intermediate file that got preserved, and make the number incorrect.

clami66 commented 2 months ago

Hi @luwei0917

I guess the example on the README is a bit misleading, but if you look at the results from the command line run you can see that the mapping between chains ABC in the model and ABC in the native has been optimized as: BAC:ABC model:native mapping

It means that if you run by using the library and fix the mapping by setting the variable chain_map = {"A":"A", "B":"B"} you will not get the optimal results. Here is what happens if you run with the optimal mapping between chains A/B in model and native:

chain_map = {"A":"B", "B":"A"} # model:native mapping
run_on_all_native_interfaces(model, native, chain_map=chain_map)

({('A', 'B'): {'DockQ_F1': 0.9957805907172309,
   'DockQ': 0.9943977591035728,
   'F1': 0.9873417721518988,
   'irms': 6.773307784361327e-07,
   'Lrms': 4.256097491138941e-07,
   'fnat': 0.9831932773109243,
   'nat_correct': 117,
   'nat_total': 119,
   'fnonnat': 0.00847457627118644,
   'nonnat_count': 1,
   'model_total': 118,
   'clashes': 0,
   'len1': 124,
   'len2': 124,
   'class1': 'ligand',
   'class2': 'receptor',
   'chain1': 'B',
   'chain2': 'A',
   'chain_map': {'A': 'B', 'B': 'A'}}},
 0.9943977591035728)

which matches the results you got from the command-line run:

Native chains: A, B Model chains: B, A DockQ: 0.994 irms: 0.000 Lrms: 0.000 fnat: 0.983 fnonnat: 0.008 clashes: 0.000 F1: 0.987 DockQ_F1: 0.996 

Also, If I run the python script inside a notebook, the result will change if I run another 'run_on_all_native_interfaces' beforehand. I guess there is some intermediate file that got preserved, and make the number incorrect.

Could you post an example of this so that I can reproduce the issue?

luwei0917 commented 2 months ago

Thanks for the quick reply. Here is an example for reproducing the issue.

from DockQ.DockQ import load_PDB, run_on_all_native_interfaces
model = load_PDB("examples/1A2K_r_l_b.pdb")
native = load_PDB("examples/1A2K_r_l_b.pdb")
chain_map = {"A":"B", "B":"A"} 
result = run_on_all_native_interfaces(model, native, chain_map=chain_map)
model = load_PDB("examples/1A2K_r_l_b.model.pdb")
native = load_PDB("examples/1A2K_r_l_b.pdb")
result = run_on_all_native_interfaces(model, native, chain_map=chain_map)
result[0]
clami66 commented 2 months ago

Due to how @lru_cache works in the code, reloading models multiple times with different PDBs might give unexpected results. We will fix this, in the meantime try and renaming the models if you are going to reload a different PDB in the same variable, like this:

from DockQ.DockQ import load_PDB, run_on_all_native_interfaces, run_on_chains
#run_on_chains.cache_clear()
model = load_PDB("examples/1A2K_r_l_b.pdb")
native = load_PDB("examples/1A2K_r_l_b.pdb")
result = run_on_all_native_interfaces(model, native, chain_map={"A":"B", "B":"A"})
print(result)

model = load_PDB("examples/1A2K_r_l_b.model.pdb")
# rename model to avoid using incorrect cached version
model.id = "examples/1A2K_r_l_b.model.pdb"
# no need to rename the native since it's the same as before
native = load_PDB("examples/1A2K_r_l_b.pdb")

result = run_on_all_native_interfaces(model, native, chain_map={"A":"B", "B":"A"})
print(result)
luwei0917 commented 2 months ago

two very small things. First, in the end of your README file. there is a comment says '# model:native chain map dictionary for two interfaces' but I think it actually should be ''# native:model chain map dictionary for two interfaces' based on my testing.

Second, is there a way to reproduce the v1 result, where we could treat A,B chains together as a single chain, and compute the dockQ score for this AB:C (say we have three chains in total)?

clami66 commented 2 months ago

I think it actually should be ''# native:model chain map dictionary for two interfaces' based on my testing.

I think you're correct, the dictionary is built to map from the native to the model's chains. We'll fix the comment

Second, is there a way to reproduce the v1 result, where we could treat A,B chains together as a single chain, and compute the dockQ score for this AB:C (say we have three chains in total)?

This is not supported now, since the interfaces are evaluated all together and the scores averaged. You could do A:C and B:C separately, then combine the scores, or combine AB into a single chain by renaming in the PDB file (I haven't tested this but it should work).