dhanjal-lab / tcr-esm

2 stars 0 forks source link

How to embed MHC #2

Open JingqiZhang1102 opened 1 month ago

JingqiZhang1102 commented 1 month ago

Hello, I have a question regarding the embeddings of MHC. I tried the following command

python3 esm/ecripts/extract.py esm1v_t33_650M_UR90S_1 /path/to/fasta_file.fasta /path/to/pt_files --repr_layers 33 --include mean

and got KeyError: '*01'.

I assume that the model esm1v_t33_650M_UR90S_1 may not be able to handle characters such as *. But based on 4_VDJDB_trainESMmodel.ipynb, mhclist contains MHC A information, and you were able to compute the embeddings of MHC. Could you elaborate on which model(s) have been used? Thank you in advance!

JingqiZhang1102 commented 1 month ago

We found a website to get protein sequence with MHC alleles. Is this potentially how you get the MHC sequence to embed? https://www.ebi.ac.uk/ipd/imgt/hla/alleles/

xinformatics commented 1 month ago

Hi @JingqiZhang1102, you are correct. We used the MHC sequences from EBI and prepared a cleaned-up version of the fasta files with correct HLA nomenclature. For embeddings the ESM1v model was used.

hope it helps.