epiTCR is a highly sensitive predictor for TCR-peptide binding. epiTCR uses TCR CDR3b sequences and peptide sequences as input. Additionally, users can also provide full length MHC to the tool. The output produces the predicted binding probability.
This repository contains the code and the data to train epiTCR model.
python >= 3.0.0
numpy 1.22.4
scikit-learn 1.1.2
For other requirements, please see the env_requirements.txt
file (here).
Users can run epiTCR in two modes: (i) train a new model and make prediction using the newly trained model, or (ii) make prediction using our pre-trained model.
Train a new model and make prediction
The main module of epiTCR is epiTCR.py
. Users can train the epiTCR model (with or without MHC) and give prediction on their data by running:
python3 epiTCR.py --trainfile data/splitData/withMHC/train/train.csv --testfile data/splitData/withMHC/test/test01.csv --chain cem
given that:
--trainfile
is a comma-separated file (CSV) containing columns for TCR, epitipe, binder, and/or full length MHC (reported by IMGT). See example training data.--testfile
is a CSV file containing columns for TCR, epitope and/or full length MHC (reported by IMGT). See example test file.--chain
specifies the chain(s) to use (ce, cem). Available options for this parameter are ce
(cdr3b+epitope) and cem
(cdr3b+epitope+mhc). Default as ce
.The prediction output is printed out on the standard output (std) or on a file (that can be specified using the option --outfile). For more information, view the section Prediction output below.
Run prediction using the pre-trained model
Users can also apply our pre-trained model to directly make prediction on their data using the module predict.py
. TCR-epitope or TCR-pMHC binding prediction can be run with:
python3 predict.py --testfile data/splitData/withMHC/test/test01.csv --modelfile models/rdforestWithMHCModel.pickle --chain cem
given that:
--testfile
is a CSV file containing columns for TCR, epitipe and/or full length MHC reported by IMGT. See example input file.--modelfile
specifies the full path of the file with trained model, should be a pickle files. Default model as models/rdforestWithMHCModel.pickle
.--chain
specifies the chain(s) to use (ce, cem). Options for this parameter are ce
(cdr3b+epitope) and cem
(cdr3b+epitope+mhc). Default as ce
.epiTCR prediction output contains a table with four columns: the CDR3b sequences, epitope sequences, (full length MHC,) and the binding probability for the corresponding complexes. The example output file is here.
For more questions or feedback, please simply post an Issue.
Please cite this paper if it helps your research:
@article{10.1093/bioinformatics/btad284,
author = {Pham, My-Diem Nguyen and Nguyen, Thanh-Nhan and Tran, Le Son and Nguyen, Que-Tran Bui and Nguyen, Thien-Phuc Hoang and Pham, Thi Mong Quynh and Nguyen, Hoai-Nghia and Giang, Hoa and Phan, Minh-Duy and Nguyen, Vy},
title = "{epiTCR: a highly sensitive predictor for TCR–peptide binding}",
journal = {Bioinformatics},
volume = {39},
number = {5},
pages = {btad284},
year = {2023},
month = {04},
issn = {1367-4811},
doi = {10.1093/bioinformatics/btad284},
url = {https://doi.org/10.1093/bioinformatics/btad284},
eprint = {https://academic.oup.com/bioinformatics/article-pdf/39/5/btad284/50204900/btad284.pdf},
}