gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
https://arxiv.org/abs/2210.01776
MIT License
976 stars 238 forks source link

Struggling to replicate PDBBind results #200

Closed nkami closed 3 months ago

nkami commented 3 months ago

Hello, I'm trying to replicate the results presented in the latest paper with Diffdock-L on the PDBBind dataset. I followed the instructions in repository README and ran the following command (after preparing the dataset and calculating the esm embeddings):

python -m evaluate --config default_inference_args.yaml --split_path data/splits/timesplit_test --split_path data/splits/timesplit_test --batch_size 10 --esm_embeddings_path data/esm2_embeddings.pt --data_dir data/PDBBind_processed/ --tqdm --split test --chain_cutoff 10 --dataset pdbbind

At the end of the run I get the following results:

  1. top 1 RMSD < 2A is 32.3%, median is 4.05
  2. top 5 RMSD < 2A is 45.5%, median is 2.21

Anyone else tried to replicate the results and succeeded? any tips on what I may be doing wrong?

gcorso commented 3 months ago

Are you looking at the filtered RMSD < 2 or RMSD < 2 performance?

nkami commented 3 months ago

I looked at the "top5_rmsds_below_2" for the top5 score. For the top1 score I took the rmsd with the top 1 confidence out of the 10 that are generated (the files rmsds.npy and confidences.npy which are both of the shape (360,10) are generated at the end of the run - this also corresponds with the "top5_rmsds_below_2" printed when I take the top5 from these numpy files).

At the end of the run the following metrics are printed:

0 failures due to exceptions
0  skipped because complex was not in confidence dataset
run_times_std 8.09
run_times_mean 16.42
mean_rmsd nan
rmsds_below_2 35.30555555555556
rmsds_below_5 59.777777777777786
rmsds_percentile_25 nan
rmsds_percentile_50 nan
rmsds_percentile_75 nan
min_rmsds_below_2 51.111111111111114
min_rmsds_below_5 80.27777777777777
mean_centroid nan
centroid_below_2 60.69
centroid_below_5 78.94
centroid_percentile_25 nan
centroid_percentile_50 nan
centroid_percentile_75 nan
top5_self_intersect_fraction 0.0
top5_rmsds_below_2 45.56
top5_rmsds_below_5 76.39
top5_rmsds_percentile_25 1.05
top5_rmsds_percentile_50 2.2
top5_rmsds_percentile_75 4.68
top5_centroid_below_2 74.44
top5_centroid_below_5 88.89
top5_centroid_percentile_25 0.33
top5_centroid_percentile_50 0.78
top5_centroid_percentile_75 2.03
top10_self_intersect_fraction 0.0
top10_rmsds_below_2 51.11
top10_rmsds_below_5 80.28
top10_rmsds_percentile_25 nan
top10_rmsds_percentile_50 nan
top10_rmsds_percentile_75 nan
top10_centroid_below_2 77.5
top10_centroid_below_5 91.39
top10_centroid_percentile_25 0.29
top10_centroid_percentile_50 0.68
top10_centroid_percentile_75 1.8
filtered_self_intersect_fraction 0.83
filtered_rmsds_below_2 42.78
filtered_rmsds_below_5 66.11
filtered_rmsds_percentile_25 1.16
filtered_rmsds_percentile_50 2.63
filtered_rmsds_percentile_75 6.5
filtered_centroid_below_2 67.22
filtered_centroid_below_5 82.5
filtered_centroid_percentile_25 0.37
filtered_centroid_percentile_50 0.88
filtered_centroid_percentile_75 2.93
top5_filtered_rmsds_below_2 49.17
top5_filtered_rmsds_below_5 76.94
top5_filtered_rmsds_percentile_25 0.99
top5_filtered_rmsds_percentile_50 2.03
top5_filtered_rmsds_percentile_75 4.64
top5_filtered_centroid_below_2 75.0
top5_filtered_centroid_below_5 89.44
top5_filtered_centroid_percentile_25 0.3
top5_filtered_centroid_percentile_50 0.69
top5_filtered_centroid_percentile_75 1.98
top10_filtered_rmsds_below_2 51.11
top10_filtered_rmsds_below_5 80.28
top10_filtered_rmsds_percentile_25 nan
top10_filtered_rmsds_percentile_50 nan
top10_filtered_rmsds_percentile_75 nan
top10_filtered_centroid_below_2 77.5
top10_filtered_centroid_below_5 91.39
top10_filtered_centroid_percentile_25 0.29
top10_filtered_centroid_percentile_50 0.68
top10_filtered_centroid_percentile_75 1.8

Thanks for the help and quick reply.

gcorso commented 3 months ago

All the numbers that do not contain "filtered" are all performances without any confidence model, the numbers that contain. "filtered" are based on running the diffusion model to get 10 samples and then taking the best (and the best 5) according to the confidence score. These are the results that we report on the paper (if one uses no confidence model there is no reason to take multiple samples). Apologies for the confusion with the metrics names.

nkami commented 3 months ago

Thank you! One last question: Could you explain the purpose of the rmsds.npy and confidences.npy files generated at the end of the run?

gcorso commented 3 months ago

yes they contain the rmsds of the structures in the order that they were generated and the corresponding confidences. If you'd like to replicate the filtered results you should order the rmsd array of the poses using the confidence array