bio-ontology-research-group / deepgo2

BSD 3-Clause "New" or "Revised" License
25 stars 3 forks source link

How to understand the output prediction results? #3

Closed Leo-T-Zang closed 4 months ago

Leo-T-Zang commented 4 months ago

Hi DeepGO2 Team,

Thanks for open-sourcing this useful tool!

I have question regarding how to understand the output predictions in tsv files, for example in example_preds_bp:

DEFEU_ASPAM | GO:0009987 | 0.534
-- | -- | --
DEFEU_ASPAM | GO:0065007 | 0.201
DEFEU_ASPAM | GO:0050789 | 0.113
DEFEU_ASPAM | GO:0050896 | 0.585
DEFEU_ASPAM | GO:0006950 | 0.569
DEFEU_ASPAM | GO:0051179 | 0.266
DEFEU_ASPAM | GO:0051234 | 0.325
DEFEU_ASPAM | GO:0006810 | 0.284
DEFEU_ASPAM | GO:0009605 | 0.266
DEFEU_ASPAM | GO:0044419 | 0.668
DEFEU_ASPAM | GO:0043207 | 0.246
DEFEU_ASPAM | GO:0051707 | 0.246
DEFEU_ASPAM | GO:0009607 | 0.256
DEFEU_ASPAM | GO:0042742 | 0.133
DEFEU_ASPAM | GO:0006952 | 0.47
DEFEU_ASPAM | GO:0009617 | 0.152
DEFEU_ASPAM | GO:0098542 | 0.232
DEFEU_ASPAM | GO:0044092 | 0.351
DEFEU_ASPAM | GO:0065009 | 0.286
DEFEU_ASPAM | GO:0006812 | 0.509
DEFEU_ASPAM | GO:0098662 | 0.562
DEFEU_ASPAM | GO:0098655 | 0.543
DEFEU_ASPAM | GO:0006813 | 0.311
DEFEU_ASPAM | GO:0098660 | 0.566
DEFEU_ASPAM | GO:0030001 | 0.476
DEFEU_ASPAM | GO:0034220 | 0.512
DEFEU_ASPAM | GO:0055085 | 0.462
DEFEU_ASPAM | GO:0071805 | 0.353
DEFEU_ASPAM | GO:0006811 | 0.476
DEFEU_ASPAM | GO:0006814 | 0.17
DEFEU_ASPAM | GO:0035725 | 0.176
DEFEU_ASPAM | GO:0009620 | 0.124
DEFEU_ASPAM | GO:0035821 | 0.389
DEFEU_ASPAM | GO:0044359 | 0.333
DEFEU_ASPAM | GO:0035737 | 0.16
DEFEU_ASPAM | GO:0035738 | 0.163
DEFEU_ASPAM | GO:0031640 | 0.124
DEFEU_ASPAM | GO:0044488 | 0.149
DEFEU_ASPAM | GO:0044561 | 0.347
DEFEU_ASPAM | GO:0044363 | 0.113
DEFEU_ASPAM | GO:0044361 | 0.106
DEFEU_ASPAM | GO:0044362 | 0.329
DEFEU_ASPAM | GO:0044360 | 0.108
DEFEU_ASPAM | GO:0044489 | 0.149
DEFEU_ASPAM | GO:0044560 | 0.191
DEFEU_ASPAM | GO:0044493 | 0.178
DEFEU_ASPAM | GO:0044492 | 0.186

How do we understand the values in the last column and why each sequence has different output GO terms?

Thanks a lot!

coolmaksat commented 4 months ago

Hi Leo, The last column is the prediction score of the model. The best Fmax score is usually achieved using a threshold between 0.2-0.3. However, if you like to see more specific annotations you might want to lower it. Depending on the protein sequence the predicted GO terms will be different.

Leo-T-Zang commented 4 months ago

Hi @coolmaksat,

Thanks a lot for your reply. If I understand correctly, higher score indicates more general GO terms like general function annotation, and lower score indicates very specific functions?

coolmaksat commented 4 months ago

Not necessarily, some specific classes also have high scores, specifity of the classes mostly depend on the number of annotations. These scores represent prediction model's confidence.

Leo-T-Zang commented 4 months ago

Oh, I see. So you are saying that higher score means more confident prediction, but set threshold of 0.2-0.3 is good enough for best Fmax. Sorry if I misunderstand anything.

I guess my question is more from a user perspective: say if now I have some GO terms predicted from you model, should I select top 5 or top 10 for final annotation or use predicted score threhold 0.3 to filter predictions?

Thanks a lot !!

coolmaksat commented 4 months ago

It is better to set a threshold, for example 0.3, but I would make this decision based on some indirect evaluation

Leo-T-Zang commented 4 months ago

Thanks!