google-deepmind / alphafold

Open source code for AlphaFold.
Apache License 2.0
12.35k stars 2.21k forks source link

prediction results are not consistent #906

Open lijing28101 opened 6 months ago

lijing28101 commented 6 months ago

Hi, I'm using alphafold2 to predict my protein structure. I found the result models always different no matter for x y z coordinates or plddt even I used the exactly same command line and didn't change any parameter. Sometimes I can even see the 3d structure difference in the visualization. Could you explain the reason of the variance? I want to use alphafold to predict the structure for different mutations, so the variance is important to me.

tcoates5 commented 6 months ago

alphafold2 relies on some fundamentally stochastic mechanisms. Most of the time, it finds almost exactly the same top answer, but sometimes there is greater variation, particularly when you have a lower plddt score. There are a number of options you can use to reduce the variance: increasing the recycle parameter, increasing the number of models, increasing the number of structures from each model, setting the seed ahead of time, and so on. Setting the seed in advance is the one option that shouldn't increase your run time. Additionally, anything that appears unstructured (usually long loops) should be expected to vary significantly from run to run and model to model.

lildeadprince commented 6 months ago

I agree with the explanation above.

But if you still wish to have a stable prediction, then please refer to --random_seed parameter (input value type is a number). It is used in pseudo-random number generation to be able to have stable sequences of random number. Seed is a starting point for the random generator queue.

Thus if you provide same seed for the same prediction, it will generate the same exact output.