deepmodeling / Uni-Mol

Official Repository for the Uni-Mol Series Methods
MIT License
704 stars 124 forks source link

Unimol Docking V2 strange result on posebuster set #261

Open simmed00 opened 2 months ago

simmed00 commented 2 months ago

I followed the posebuster.ipynb provided in your repo, downloaded the weight file and downloaded the eval_set zip files. However, since the eval_set does not contain pdb file, so I downloaded from the Posebuster repo and move the pdb file into the eval_set folder. I ran everything smoothly, and the notebook end up giving me a RMSD<2A passing rate of something near 0.8.

However, when I export the predicted pose, and run the Posebuster quality check again, the passing rate significantly drops to lower than 0.6, and the passing rate with PB-valid drop below 0.2. I checked some of the prediction, in some part of the molecule, atoms clash together, which is quite surprising.

I wonder if there is any difference in the passing criteria in the Posebuster repo and the passing criteria written in your ipynb? And if there is details I missed in running the ipynb provided in your github?

hypnopump commented 2 months ago

Hi @simmed00 , thanks for your interest in UniMol Docking! Could you provide examples of the clashing atoms you mention? And can you describe more accurately the process you follow to calculate the rmsd? It would be great if your issue can be reproduced to better understand the discrepancy.

simmed00 commented 2 months ago

basically I followed the Posebuster way like below: true_file is the GT pose, cond_file is the protein, test_file is the prediction from UniMol 5SAK_predict.zip I attached one of the strange output. true_file = root + '/' + subject + '/' + subject + '_ligand.sdf' cond_file = root + '/' + subject + '/' + subject + '_protein.pdb' test_file = unidock_root + '/' + subject_short + '_predict.sdf' buster = PoseBusters(config="redock") try: df = buster.bust([test_file], true_file, cond_file, full_report=True) print(df) df.to_csv(root + '/' + subject + '/unidock' + '.csv') except Exception as e: print(subject, e)

simmed00 commented 2 months ago

Thank you for your prompt reply. I ran the posebuster_demo notebook, and the final result is: results length: 428 RMSD < 0.5 : 0.08878504672897196 RMSD < 1.0 : 0.4696261682242991 RMSD < 1.5 : 0.6985981308411215 RMSD < 2.0 : 0.7780373831775701 RMSD < 3.0 : 0.866822429906542 RMSD < 5.0 : 0.9299065420560748 avg RMSD : 1.716739050074667


results length: 428 RMSD < 0.5 : 0.08878504672897196 RMSD < 1.0 : 0.4766355140186916 RMSD < 1.5 : 0.7032710280373832 RMSD < 2.0 : 0.7850467289719626 RMSD < 3.0 : 0.8738317757009346 RMSD < 5.0 : 0.9369158878504673 avg RMSD : 1.6454183516713314. So I assume I get it right since it is not very far from the result in your paper.

hypnopump commented 2 months ago

Hi again @simmed00 , your issue was reproduced. The output you shared had indeed some steric clashes, whereas the expected output should not.

This has been fixed in the main branch (please update your code to use it!). Please feel free to close the issue

simmed00 commented 2 months ago

Thanks for the fix. I added back the -steric-clash-fix command to the inference call, and the results after running the notebook change a little bit, it looks a bit lower than that in the paper, shown as below:


results length: 428 RMSD < 0.5 : 0.05841121495327103 RMSD < 1.0 : 0.28738317757009346 RMSD < 1.5 : 0.5560747663551402 RMSD < 2.0 : 0.7126168224299065 RMSD < 3.0 : 0.8387850467289719 RMSD < 5.0 : 0.9065420560747663 avg RMSD : 2.116286272244654


results length: 428 RMSD < 0.5 : 0.08878504672897196 RMSD < 1.0 : 0.3621495327102804 RMSD < 1.5 : 0.6261682242990654 RMSD < 2.0 : 0.7476635514018691 RMSD < 3.0 : 0.8457943925233645 RMSD < 5.0 : 0.9112149532710281 avg RMSD : 1.976056631357393

I further downloaded the predicted pose and check. The earlier example of 5SAK is now fixed. I then used the same code in Posebuster to test it. This time, quite often the warning "Can't kekulize mol" appears (100+). The % passing the <2A criteria is 194/428=45%, and the % passing the <2A and PB-valid is only 160/428=37%. There is still some gap between the reported result. Is there any other details I missed when running the docking?