PaddlePaddle / PaddleHelix

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
Apache License 2.0
801 stars 189 forks source link

Posebuster dataset demo in HelixDock #292

Open ivandon15 opened 1 month ago

ivandon15 commented 1 month ago

Hi PaddleHelix,

During reproducing, I found out the json files related to posebuster dataset are actually the same as pdbbind, is that right? Besides, I was also wondering how to using the score function as you mentioned in the paper to choose the good generated ligands? Please tell me if I missed something.

Thank you for your work and patiences!

Noisyntrain commented 1 month ago

Hi ivandon, the posebuster dataset config has now been fixed. As for the score function, for the RTMScore part, you may refer to RTMScore repo and use the predicted ligand and protein to as the input to get the corresponding score. For the PoseBuster score, you may install posebuster tool from https://github.com/maabuu/posebusters/tree/main , and use the command bust ligand_pred.sdf -l mol_true.sdf -p protein.pdb --outfmt csv to get the number of checks that the compelx passes(note that the last check RMSD<2A needs to be excluded from the counting). Now you can combine the two scores together(with a weight of 0.9 and 0.1 acordingly) to get the final score. And please note that for multi-time sampling, you need to remove line 164 in evaluate.py: https://github.com/PaddlePaddle/PaddleHelix/blob/3e8baaa909ab59894ddb1058c38e7a59adb33b6d/apps/molecular_docking/helixdock/evalute.py#L164