Strange Pattern Running tcr-esm Against Certain Epitopes

JingqiZhang1102 commented 2 months ago

Thank you very much for your amazing project! We have a few questions from our attempts running this model.

[1] We observed some interesting patterns when running tcr-esm. More specifically, we generated some tcr sequences by OLGA and our simpler random generator, which generated fake tcr AA sequences based on VDJdb tcr length distribution, and sampled 1000 tcrs from both groups. We tested the tcr sequences against 3 epitopes: GILGFVFTL, KLNVGDYFV, and NNILIATCV binding respectively to 4620, 169, and 1 TCR in the IEDB dataset. They are denoted as ‘high, medium, low’ epitopes. We plotted individual predicted probabilities by tcr-esm. For GILGFVFTL, there is a peak around 0.55 with tcr-esm trained by vdjdb, and there is no value larger than this peak. We observed the same peak for the plot of KLNVGDYFV but with several values larger than this peak. With the mcpas-trained model, we observed this pattern as well, where the peak value shifted to roughly 0.65. GILGFVFTL can bind to much more tcrs based on IEDB compared to KLNVGDYFV, so we expected a similar pattern when investigating tcr-esm results. But instead, we observed this interesting pattern, which also happened when we tested another epitope KLGGALQAK. It seems like ~0.55 and ~0.65 are quite popular predicted binding probabilities for vdjdb- and mcpas-trained models respectively, on the epitopes we’ve tried (GILGFVFTL, KLNVGDYFV, KLGGALQAK). Have you had similar experiences? If so, do you have any guess on why this might be the case? Can you see something that we could be doing wrong?

[2] We tested on the AB paired data from VDJdb with Score=3, which should be confident binding records. We took the TCRA, TCRB, and epitope columns, converted them to .fasta then .npy embeddings, and uploaded them to the web interface. Then, we visualized the individual predicted probabilities with histograms. We tried two sets of options: (1) TCRA, TCRB, epitope (2) TCRB, epitope. The predicted results by vdjdb-trained tcr-esm are not as high as we expected, given that the pairs are recorded in VDJdb with high confidence. Do you know why we get such scores for high-confidence VDJdb entries? It seems strange that there are more predictions close to 0 than to 1.

Our workflow to create tcr-esm input files is described below. (1) Save amino acid sequences as .fasta. For instance,

Sequence_1 CAAAAAF Sequence_2 CAAAAAAAAAAF … (2) Compute tcr embeddings with extract.py from esm project with the command python3 extract.py esm1v_t33_650M_UR90S_1 fasta_path.fasta pt_path --repr_layers 33 --include mean (3) Read all .pt files, concatenate them as a large npy matrix, save as .npy. (4) Save the epitope amino acid sequences as .fasta. For instance, Sequence_1 GILGFVFTL ... (5) Compute the epitope embeddings with extract.py, concatenate them as .npy. (6) Upload the tcr.npy and pep.npy to the web interface. Run tcr-esm and download the results as .csv.

Thank you very much in advance for any suggestions!

xinformatics commented 4 weeks ago

Hello @JingqiZhang1102, apologies for the delayed response.

We have not observed this trend before and is new to us. We did not generate random TCRs since the objective of our work was different. The models on the server might not be perfectly calibrated, meaning that the predicted probabilities do not accurately reflect the true likelihood of binding. This could be a possible reason for getting 0.55 / 0.65 as probabilities. As an alternative you could try retraining the models and calibrating the probabilities.

For the second question, could you please elaborate on which dataset you are referring to? We have used the dataset provided by the ERGO-II model which doesn't have an entry with "Score".

Please let me know if you have any other questions?

Thanks

JingqiZhang1102 commented 3 weeks ago

Hello @xinformatics , thank you for your suggestions!

For the second question, we were testing with VDJdb (https://vdjdb.cdr3.net/search). We took the Human AB paired entries. The last column 'Score' seems to reflect how confident the corresponding entry is, if I understand that correctly. ERGO-II provided VDJdb-trained and McPAS-trained models, so perhaps the dataset your team used contains VDJdb entries?

Since we tested score=3 TCR-epitope entries, we expected to see predicted probabilities closer to 1.0 instead of 0.0. But from the histogram, we observed this interesting behavior and wanted to submit an issue to see if you have made similar observations previously.

xinformatics commented 3 weeks ago

The data we downloaded from ERGO-II's github repo only contained the final label (0/1) and not the score. It is possible that the data we used for training is older than the current version of VDJdb. I expect the model predictions are heavily influenced by the training data.

dhanjal-lab / tcr-esm

Strange Pattern Running tcr-esm Against Certain Epitopes #1