PanPep is a framework constructed in three levels for predicting the peptide and TCR binding recognition. We have provided the trained meta learner and external memory and users can choose different settings based on their data available scenarios:
python == 3.9.7
pytorch == 1.10.2
numpy == 1.21.2
pandas == 1.4.1
scipy == 1.7.3
joblib, matplotlib, tensorboardx, scikit-learn also needed
Usage: PanPep.py [options] Required: --learning_setting STRING: choosing the learning setting: few-shot, zero-shot and majority --input STRING: the path to the input data file (.csv) --output STRING: the path to the output data file (.csv)
Optional: --update_lr FLOAT: task-level inner update learning rate (default: 0.01) --update_step_test INT: update steps for finetunning (default: 3) --C INT: Number of bases (default: 3) --R INT: Peptide Index matrix vector length (default: 3) --L INT: Peptide embedding length (default: 75) We provided three examples in different learning settings to show you how to use PanPep to predict the peptide and TCR recognition.
Command:
python PanPep.py --learning_setting few-shot --input ./Data/Example_few-shot.csv --output ./Output/Example_few-shot_output.csv
input.csv: input *.csv file contains three columns: Peptide, CDR3 and Label, which represents the peptide sequence, TCR CDR3 squence and their binding specificity. In the Label column, there are three values: 1 indicating binding, 0 indicating non-binding and unknown. Then, known peptide-CDR3 pairs will be used to construct the TCR support set to fine-tune the basic meta learner and unknown peptide-CDR3 pairs will be used to construct the TCR query set for being predicted.
output.csv: out *.csv file contains three columns: Peptide, CDR3 and Score, which represents the peptide sequence, TCR CDR3 squence and their predicted binding score. All the peptide-CDR3 pairs are the unknown pairs in the input file.
Command:
python PanPep.py --learning_setting zero-shot --input ./Data/Example_zero-shot.csv --output ./Output/Example_zero-shot_output.csv
input.csv: input *.csv file contains two columns: Peptide and CDR3, which represents the peptide sequence, TCR CDR3 squence.
output.csv: out *.csv file contains three columns: Peptide, CDR3 and Score, which represents the peptide sequence, TCR CDR3 squence and their predicted binding score. All the peptide-CDR3 pairs are the pairs in the input file.
Command:
python PanPep.py --learning_setting majority --update_step_test 1000 --input ./Data/Example_majority.csv --output ./Output/Example_majority_output.csv
Yicheng Gao, Yuli Gao, Qi Liu et al. Pan-Peptide Meta Learning for T-Cell Receptor-Antigen Binding Recognition, Nature Machine Intelligence, 2023.
bm2-lab@tongji.edu.cn