Open Elisa89m opened 3 years ago
Ideally, the ERGO-II model outputs should estimate the probability that a given TCR binds to a given peptide (similar to logistic regression) - the higher the score, the better the prediction. However, in practice deep-learning based models such as ERGO-II involve sigmoid functions and are trained using cross-entropy loss - which causes the model outputs to be extremely close to 0 or 1. Moreover, AUC score (which is ERGO-II objective) is not affected by the output values, but by the relative order of the values in the evaluation set. Thus, I would not interpret the outputs as plain probability scores (yet the relative score is meaningful). Having said that, previous experiments of our group show that ERGO score above 0.95-0.98 is reliable as positive binding in most cases (however this is not properly tested in vitro). Sincerely, Ido Springer
Ideally, the ERGO-II model outputs should estimate the probability that a given TCR binds to a given peptide (similar to logistic regression) - the higher the score, the better the prediction. However, in practice deep-learning based models such as ERGO-II involve sigmoid functions and are trained using cross-entropy loss - which causes the model outputs to be extremely close to 0 or 1. Moreover, AUC score (which is ERGO-II objective) is not affected by the output values, but by the relative order of the values in the evaluation set. Thus, I would not interpret the outputs as plain probability scores (yet the relative score is meaningful). Having said that, previous experiments of our group show that ERGO score above 0.95-0.98 is reliable as positive binding in most cases (however this is not properly tested in vitro). Sincerely, Ido Springer
Thank you for the reply. I have another question. I’m trying to perform a test with ERGO II, in particular I considered six TCR-MHC-peptide complex extracted from PDB database and I built a false positive dataset adding the same TCR-MHC complex with many other peptide with the same length extracted from IEDB. I noticed that 4 real PDB complexes (True positive) are predicted as the best by ERGO II as expected, instead two PDB complexes that have the same HLA (HLA-B*08:01) fall in the middle of the distribution. I’m wondering if there is a bias introduced by the HLA.
I counted the number of pairs with the same HLA in training dataset and I noticed that there is an high difference between the number of pairs, 8 out of 56 HLA have at least 5000 pairs and in particular among the first 8 HLA there is the HLA corresponding to those PDB that are ranked in the middle respect to false positive dataset.
Thanks you Best regards, Elisa
Hi,
Can you give me some information about the meaning of the score provided in ERGO-II result table? In particular, I'm wondering if I have to consider it as a binding probability, like a p-value so the best prediction is the lowest score or the higher the score, the better the prediction.
Thank you,
Elisa