google-deepmind / alphafold

Open source code for AlphaFold 2.
Apache License 2.0
12.74k stars 2.26k forks source link

difference of results between locally run alphafold and alphafold colab #600

Open danny2551515 opened 2 years ago

danny2551515 commented 2 years ago

Hello. I am making structural predictions using the colab version and the local version. I have confirmed that there is a big difference between the two versions. To solve this difference, I run alphafold locally with -db_preset=reduced_dbs and max_template_date=1000-01-01. But there is still a big difference. Is it possible to get similar results from the alphafold to the alphafold colab version?

Htomlinson14 commented 2 years ago

Hi yes its possible to get different results as they are different environments and the model is stochastic. Could you please provide full reproducibility details about your run? Also, have you updated to the latest version?

danny2551515 commented 2 years ago

Hello Htomlinson14

Sorry about the late response.

I normally use AlphaFold V.2.2.2, and after I checked your comment, I used V.2.2.4, the latest version, for structural prediction.

I carried out the prediction under these four conditions:

  1. use full DB and templates

python3 /data/AF/alphafold/docker/run_docker.py \ --fasta_paths=${fasta_file_path} \ --max_template_date=3000-01-01 \ --model_preset=monomer_ptm \ --data_dir=/data/AF/AFDB/ \ --docker_user=0 \ --gpu_devices=0 \ --output_dir=/data/result/

  1. use reduced DB and templates

python3 /data/AF/alphafold/docker/run_docker.py \ --fasta_paths=${fasta_file_path} \ --max_template_date=3000-01-01 \ --model_preset=monomer_ptm \ --data_dir=/data/AF/AFDB/ \ --docker_user=0 \ --gpu_devices=0 \ --output_dir=/data/result/ \ --db_preset=reduced_dbs

  1. use full DB and disable templates

python3 /data/AF/alphafold/docker/run_docker.py \ --fasta_paths=${fasta_file_path} \ --max_template_date=1000-01-01 \ --model_preset=monomer_ptm \ --data_dir=/data/AF/AFDB/ \ --docker_user=0 \ --gpu_devices=0 \ --output_dir=/data/result/

  1. use reduced DB and disable templates

python3 /data/AF/alphafold/docker/run_docker.py \ --fasta_paths=${fasta_file_path} \ --max_template_date=1000-01-01 \ --model_preset=monomer_ptm \ --data_dir=/data/AF/AFDB/ \ --docker_user=0 \ --gpu_devices=0 \ --output_dir=/data/result/ \ --db_preset=reduced_dbs

Under these four conditions, the five predictions I obtained respectively from both V.2.2.2 and V.2.2.4 were very different from the predictions from AlphaFold colab.

Are you aware of the factors behind these differences occuring?

Like I asked earlier, how can I obtain similar predictions from local AlphaFold and AlphaFold Colab?

Htomlinson14 commented 2 years ago

Hi can you please report the machine (GPU) you are using and also the iptm+ptm results for each of these runs?

Htomlinson14 commented 2 years ago

You may also wish to follow the discussions on https://github.com/deepmind/alphafold/issues/597

danny2551515 commented 2 years ago

The average plddt value from the AlphaFold colab was as follows:

ver.alphafold colab

avg_plddt = 67.1059722222222

The machine that has AlphaFold installed uses Quadro RTX 8000. In AlphaFold V.2.2.2 and V.2.2.4, the predictions we obtained under the following four conditions are as follows:

1.full db & templates = 3000-01-01 ver.2.2.4 "plddts": { "model_1_ptm_pred_0": 97.25193926550712, "model_2_ptm_pred_0": 97.18683996607362, "model_3_ptm_pred_0": 54.927574842232, "model_4_ptm_pred_0": 58.54845161102359, "model_5_ptm_pred_0": 43.06532542208311 },

ver.2.2.2 "plddts": { "model_1_ptm_pred_0": 97.36867929339586, "model_2_ptm_pred_0": 97.21677934497643, "model_3_ptm_pred_0": 50.224661763080945, "model_4_ptm_pred_0": 39.979508179484505, "model_5_ptm_pred_0": 44.549697156675734 }

2.full db & templates = 1000-01-01 ver.2.2.4 "plddts": { "model_1_ptm_pred_0": 43.664334482582525, "model_2_ptm_pred_0": 48.08251340941516, "model_3_ptm_pred_0": 61.68443432122087, "model_4_ptm_pred_0": 48.93973191788411, "model_5_ptm_pred_0": 39.43275963009237 }

ver.2.2.2 "plddts": { "model_1_ptm_pred_0": 58.981938345566, "model_2_ptm_pred_0": 46.08298427146891, "model_3_ptm_pred_0": 58.97027318035094, "model_4_ptm_pred_0": 55.71557161934546, "model_5_ptm_pred_0": 44.21769917365889 }

3.reduced db & templates = 3000-01-01 ver.2.2.4 "plddts": { "model_1_ptm_pred_0": 97.26357937406594, "model_2_ptm_pred_0": 97.13925092424562, "model_3_ptm_pred_0": 40.45697902340831, "model_4_ptm_pred_0": 41.859432024373355, "model_5_ptm_pred_0": 40.82802636714861 },

ver.2.2.2 "plddts": { "model_1_ptm_pred_0": 97.46789556103246, "model_2_ptm_pred_0": 97.30054276924895, "model_3_ptm_pred_0": 53.98386122906149, "model_4_ptm_pred_0": 45.08604272171022, "model_5_ptm_pred_0": 38.54566059335381 }

4.reduced db & templates = 1000-01-01 ver.2.2.4 "plddts": { "model_1_ptm_pred_0": 48.53682881088287, "model_2_ptm_pred_0": 49.97295933519744, "model_3_ptm_pred_0": 55.95563452058329, "model_4_ptm_pred_0": 54.111995563263015, "model_5_ptm_pred_0": 42.91102593340323 }

ver.2.2.2 "plddts": { "model_1_ptm_pred_0": 43.48648526660625, "model_2_ptm_pred_0": 52.457455662168826, "model_3_ptm_pred_0": 48.408318636112455, "model_4_ptm_pred_0": 61.21301963988726, "model_5_ptm_pred_0": 52.85276750489996 }

Htomlinson14 commented 2 years ago

Hi - thanks very much for providing these values. I can't say exactly what is happening here but a few things to consider:

danny2551515 commented 2 years ago

Yes, when I checked the Ranking_debug.json file, it was marked as plddt. Should I check the ptm value in the pkl file and tell you?

Htomlinson14 commented 2 years ago

Ok cool. I think in this case the second and third bullets above are most relevant, particularly the comments on the potential impact of newer templates. Thanks!

danny2551515 commented 2 years ago

Hi @Htomlinson14 .

I repeated attempts to match the results of local version and colab version.

I confirmed advice from Issue #126 to disable HHBlits on UniClust in addition to db-preset and templates.

Could you tell me how to disable HHBlits?