Full test scripts to reproduce the metrics results in the paper.

ljpadam commented 2 years ago

Thank you for open-sourcing this great work!

I am a freshman in this topic. And I notice there are a lot of different metrics used in the paper, such as accuracy, DCA, DCC, DVO, success rate of Top-N, Top-(N+2), and ratio.

Could you kindly please provide the testing scripts to calculate these metrics on four datasets for reproducing the results in your paper?

It will be a great help to cite and compare with your paper. Thanks in advance.

RishalAggarwal commented 2 years ago

Hey, unfortunately I do not have those scripts with me as the server with those files went down :( . Reproducing the metrics should be simple enough though once you've generated the predictions across the dataset. After that it's manipulation of generated text files using python. To load molecule files one can use openbabel.

ljpadam commented 2 years ago

Thanks for your reply. I will try to implement these scripts by myself.

I have some more questions about the details of the evaluation.

When you calculate the success rate of Top-N, do you first calculate the success rate of each protein, and then average them? Or you put predictions of all proteins together, and divide it by the number of groundtruth pockets of all proteins?
Which model(s) is used for getting the metrics on COACH420 and HOLO4k? One of the 10-fold models trained on scPDB, or all of them, or you retrain a new model on COACH420?
I found there are trainning and testing types files for COACH420, and HOLO4k. Is only the testing types file used in evaluation? What is the purpose of training types files?
In the "Data Sets and Preprocessing" section of your paper, you first mention that "there are 291 protein structures and 359 ligands, 3413 protein structures, and 4288 ligands for COACH420 and HOLO4k", and then mention that "207 out of 291 proteins (71.13%) and 2752 out of 3413 proteins (80.63%) for the COACH420 and HOLO4k data sets". Do you mean that the numbers of proteins in COACH420 and HOLO4k are 291 and 3413 for classification, and 207 and 2752 for segmentation?

Sorry for asking so many questions. I am so interested in your work, and sincerely thank for your help.

RishalAggarwal commented 2 years ago

Top-N is calculated for each protein individually, take the top N predictions for each protein where 'N' is the number of annotated pockets for that protein and calculate the metric. You also need to be careful about subpockets as fpocket sometimes gives multiple pocket centers for the same pocket (essentially predicting the same pocket again). You can cross-check that with the proximity to the corresponding ligand.
We have separately trained models for COACH420 and HOLO4k
The training types files contain datapoints from the scPDB dataset after removing protein that are similar to datapoints in the corresponding test set.
Yes that is correct

RishalAggarwal commented 2 years ago

My bad, for the first point, success rate is calculated by putting all (Top-N unique) predictions of all proteins together, and dividing it by the number of ground truth pockets.

fses91 commented 10 months ago

Top-N is calculated for each protein individually, take the top N predictions for each protein where 'N' is the number of annotated pockets for that protein and calculate the metric. You also need to be careful about subpockets as fpocket sometimes gives multiple pocket centers for the same pocket (essentially predicting the same pocket again). You can cross-check that with the proximity to the corresponding ligand.

We have separately trained models for COACH420 and HOLO4k

The training types files contain datapoints from the scPDB dataset after removing protein that are similar to datapoints in the corresponding test set.

Yes that is correct

Hi, I read your paper and also saw this comment, that you have seperately trained models for COACH420 and HOLO4k, how can that be? How do you chooose which model you used for COACH420 and which for HOLO4k? And you are also reporting the DCA and DCC results for COACH420, HOLO4k and SC6K. How can the DCC be higher than DCA for COACH420 and SCK9? The DCA should be always higher than DCA in my opinion, because it's simpler to be near any ligand atom than near the pocket center?

Best regards

RishalAggarwal commented 9 months ago

models are trained separately for COACH420 and HOLO4K so that theres no data leakage while evaluating the model. The DCC is only reported for pockets that have been predicted correctly according to the DCA criterion by the classifier.

devalab / DeepPocket

Full test scripts to reproduce the metrics results in the paper. #9