gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
https://arxiv.org/abs/2210.01776
MIT License
1.06k stars 255 forks source link

What is a "good" confidence score? #40

Closed phiweger closed 1 year ago

phiweger commented 1 year ago

With AlphaFold, when pLDDT is say above 70, you can gain some trust in the prediction. For DiffDock, what is a range where you would "trust" the results?

HannesStark commented 1 year ago

The confidence model is trained by predicting whether or not the RMSD of a generative sample is below 2 angstrom or not. So if the confidence score is higher than 0 then the prediction would be that the RMSD is below 2.

phiweger commented 1 year ago

Thanks @HannesStark for the quick response. For negative values, is there some kind of response curve (x axis confidence, y axis RMSD)? I am asking because in a protein of interest I get estimates from -2 to -10, and I wonder whether all of them are "bad" or how bad -2 is compared to -10. Thanks!

gcorso commented 1 year ago

Hi @phiweger, they are logits so one could pass the predicted values in a sigmoid to obtain a true confidence estimate. However, note that:

  1. these have not been calibrated (i.e. confidence 20% does not necessarily mean that in practice 20% of the predictions are correct), there is a very rich literature trying to deal with this problem, hopefully, in the future, we'll be able to apply some of it to DiffDock
  2. at the moment the confidence model is trained on bound atomic structures. When fed in a non-bound structure it is likely the model tends to be underconfident about its own predictions. Hopefully, we'll release a new confidence model trained on unbound structures soon