IdoSpringer / ERGO-II

ERGO-II, an updated version of ERGO including more features for TCR-peptide binding prediction
MIT License
29 stars 6 forks source link

ERGO-II output Score #4

Open Elisa89m opened 3 years ago

Elisa89m commented 3 years ago

Hi,

Can you give me some information about the meaning of the score provided in ERGO-II result table? In particular, I'm wondering if I have to consider it as a binding probability, like a p-value so the best prediction is the lowest score or the higher the score, the better the prediction.

Thank you,

Elisa

IdoSpringer commented 3 years ago

Ideally, the ERGO-II model outputs should estimate the probability that a given TCR binds to a given peptide (similar to logistic regression) - the higher the score, the better the prediction. However, in practice deep-learning based models such as ERGO-II involve sigmoid functions and are trained using cross-entropy loss - which causes the model outputs to be extremely close to 0 or 1. Moreover, AUC score (which is ERGO-II objective) is not affected by the output values, but by the relative order of the values in the evaluation set. Thus, I would not interpret the outputs as plain probability scores (yet the relative score is meaningful). Having said that, previous experiments of our group show that ERGO score above 0.95-0.98 is reliable as positive binding in most cases (however this is not properly tested in vitro). Sincerely, Ido Springer

Elisa89m commented 3 years ago

Ideally, the ERGO-II model outputs should estimate the probability that a given TCR binds to a given peptide (similar to logistic regression) - the higher the score, the better the prediction. However, in practice deep-learning based models such as ERGO-II involve sigmoid functions and are trained using cross-entropy loss - which causes the model outputs to be extremely close to 0 or 1. Moreover, AUC score (which is ERGO-II objective) is not affected by the output values, but by the relative order of the values in the evaluation set. Thus, I would not interpret the outputs as plain probability scores (yet the relative score is meaningful). Having said that, previous experiments of our group show that ERGO score above 0.95-0.98 is reliable as positive binding in most cases (however this is not properly tested in vitro). Sincerely, Ido Springer

Thank you for the reply. I have another question. I’m trying to perform a test with ERGO II, in particular I considered six TCR-MHC-peptide complex extracted from PDB database and I built a false positive dataset adding the same TCR-MHC complex with many other peptide with the same length extracted from IEDB. I noticed that 4 real PDB complexes (True positive) are predicted as the best by ERGO II as expected, instead two PDB complexes that have the same HLA (HLA-B*08:01) fall in the middle of the distribution. I’m wondering if there is a bias introduced by the HLA.

I counted the number of pairs with the same HLA in training dataset and I noticed that there is an high difference between the number of pairs, 8 out of 56 HLA have at least 5000 pairs and in particular among the first 8 HLA there is the HLA corresponding to those PDB that are ranked in the middle respect to false positive dataset.

Thanks you Best regards, Elisa