Outputting probabilities from the model

barthelemymp / TULIP-TCR

GNU General Public License v3.0

8 stars 3 forks source link

Outputting probabilities from the model #9

Closed alienmist325 closed 2 months ago

alienmist325 commented 4 months ago

By default, the model returns ranks (for fixed peptides), and scores (which are presumably comparable between peptides). I wish to recover a probability that a peptide binds with a particular TCR. Having inspected the code (and from the paper), it looks as though probabilities are obtained from the model, the logarithm of them is taken, and these values are summed up to produce the score. Can the exponential of this score be taken, and can this be interpreted as the binding probability, or is there a different interpretation or approach needed?

barthelemymp commented 4 months ago

Thank you for you interest. Unfortunately, TULIP is not directly able to do that. This is the drawback of being unsupervised. It defines a sequence probability, which is NOT a binding proba. We show in the paper how they can be related, but it will only work for a fixed peptide. (the two proba are proportional, with the proportionality constant being dependant on the peptide). Usually we compare different TCR for a fixed specific peptide. If you want to compare score between different peptide, you could try to look at the rank inside an healthy repertoire.

alienmist325 commented 2 months ago

Thank you for clarifying this; indeed, there is a relationship that has been detailed in the paper for healthy repertoires, but certainly, the constant is peptide-dependent. Thank you for the alternative suggestions and apologies for not closing this issue earlier.