The docking results reproduction

QizhiPei commented 1 year ago

Hello, it's relly a wonderful work! I tried your scipts and checkpoint in the README as follows:

I downloaded the checkpoint in binding_pose_ckpt to ./save_pose/binding_pose_220908.pt
I downloaded the data in binding_data and decompressed it to ./protein_ligand_binding_pose_prediction

I ran the inference code on the test set

data_path="./protein_ligand_binding_pose_prediction"  # replace to your data path
results_path="./infer_pose"  # replace to your results path
weight_path="./save_pose/binding_pose_220908.pt"
batch_size=8
dist_threshold=8.0
recycling=3

python ./unimol/infer.py --user-dir ./unimol $data_path --valid-subset test \
     --results-path $results_path \
     --num-workers 8 --ddp-backend=c10d --batch-size $batch_size \
     --task docking_pose --loss docking_pose --arch docking_pose \
     --path $weight_path \
     --fp16 --fp16-init-scale 4 --fp16-scale-window 256 \
     --dist-threshold $dist_threshold --recycling $recycling \
     --log-interval 50 --log-format simple

The output of this script is ./infer_pose/weights_test.out.pkl

I ran the docking


nthreads=20  # Num of threads
predict_file="./infer_pose/weights_test.out.pkl"  # Your inference file dir
reference_file="./protein_ligand_binding_pose_prediction/test.lmdb"  # Your reference file dir
output_path="./protein_ligand_binding_pose_prediction"  # Docking results path

python ./unimol/utils/docking.py --nthreads $nthreads --predict-file $predict_file --reference-file $reference_file --output-path $output_path

and got the result below,

RMSD < 1.0 : 0.4405594405594406 RMSD < 1.5 : 0.6853146853146853 RMSD < 2.0 : 0.8041958041958042 RMSD < 3.0 : 0.8706293706293706 RMSD < 5.0 : 0.9440559440559441 avg RMSD : 1.6639526207451638


which is not consistent with the results in your paper in chemrxiv. (more like Uni-Mol random result)
So is there anything wrong with my inference and docking pipeline?
Thanks for your attention and look forward to your reply~

guolinke commented 1 year ago

@QizhiPei you can check our latest version of paper, the results are updated, since the more strict filtering in training set. In the first version, we remove the exact same protein and ligand in training data. In the updated version, we remove the similar protein and ligand in training data.

QizhiPei commented 1 year ago

Thanks for your quick reply. Have a good day!

deepmodeling / Uni-Mol

The docking results reproduction #59