deepmodeling / Uni-Mol

Official Repository for the Uni-Mol Series Methods
MIT License
676 stars 119 forks source link

The docking results reproduction #59

Closed QizhiPei closed 1 year ago

QizhiPei commented 1 year ago

Hello, it's relly a wonderful work! I tried your scipts and checkpoint in the README as follows:

  1. I downloaded the checkpoint in binding_pose_ckpt to ./save_pose/binding_pose_220908.pt
  2. I downloaded the data in binding_data and decompressed it to ./protein_ligand_binding_pose_prediction
  3. I ran the inference code on the test set

    data_path="./protein_ligand_binding_pose_prediction"  # replace to your data path
    results_path="./infer_pose"  # replace to your results path
    weight_path="./save_pose/binding_pose_220908.pt"
    batch_size=8
    dist_threshold=8.0
    recycling=3
    
    python ./unimol/infer.py --user-dir ./unimol $data_path --valid-subset test \
         --results-path $results_path \
         --num-workers 8 --ddp-backend=c10d --batch-size $batch_size \
         --task docking_pose --loss docking_pose --arch docking_pose \
         --path $weight_path \
         --fp16 --fp16-init-scale 4 --fp16-scale-window 256 \
         --dist-threshold $dist_threshold --recycling $recycling \
         --log-interval 50 --log-format simple

    The output of this script is ./infer_pose/weights_test.out.pkl

  4. I ran the docking
    
    nthreads=20  # Num of threads
    predict_file="./infer_pose/weights_test.out.pkl"  # Your inference file dir
    reference_file="./protein_ligand_binding_pose_prediction/test.lmdb"  # Your reference file dir
    output_path="./protein_ligand_binding_pose_prediction"  # Docking results path

python ./unimol/utils/docking.py --nthreads $nthreads --predict-file $predict_file --reference-file $reference_file --output-path $output_path

and got the result below,

RMSD < 1.0 : 0.4405594405594406 RMSD < 1.5 : 0.6853146853146853 RMSD < 2.0 : 0.8041958041958042 RMSD < 3.0 : 0.8706293706293706 RMSD < 5.0 : 0.9440559440559441 avg RMSD : 1.6639526207451638


which is not consistent with the results in your paper in chemrxiv. (more like Uni-Mol random result)
So is there anything wrong with my inference and docking pipeline?
Thanks for your attention and look forward to your reply~
guolinke commented 1 year ago

@QizhiPei you can check our latest version of paper, the results are updated, since the more strict filtering in training set. In the first version, we remove the exact same protein and ligand in training data. In the updated version, we remove the similar protein and ligand in training data.

QizhiPei commented 1 year ago

Thanks for your quick reply. Have a good day!