Open vtarasv opened 1 year ago
Thank you a lot for finding this and letting us know! We will push a fix together with some other improvements soon instead of immediately to not disrupt compatibility with the currently provided weights.
It seems that the error occurs when one tries to retrain the model with own complexes containing and '_' (underscore) in the name of the structure. In such a case the keyname = key.split('')[0] assigns the wrong value to the key_name.
Trying to reproduce the training process, I found that at some point in the dataset preparation algorithm the order of
lm_embeddings
and corresponding chains mismatch (for some proteins with multiple chains in the structure). The example I found is the protein from3doz
complex, wherelm_embeddings
are concatenated in the order of chains [D, B, A] and all other protein graph features in the order [A, B, D]. I believe it happens because of this part https://github.com/gcorso/DiffDock/blob/8e853d6b14fb57baf90fa8529349117439f06819/datasets/pdbbind.py#L133-L141 which does not guarantee the same order as the order of chains in a.pdb
file.