Updated on 2024.07.24
This repository provides the official implementation of Prime (Protein language model for Intelligent Masked pretraining and Environment (temperature) prediction).
Key feature:
Pro-Prime, a novel protein language model, has been developed for predicting the Optimal Growth Temperature (OGT) and enabling zero-shot prediction of protein thermostability and activity. This novel approach leverages temperature-guided language modeling.
Main Requirements
biopython==1.81
torch==2.0.1
Installation
pip install -r requirements.txt
https://drive.google.com/file/d/1AEpK3TmgFNszZXJQWwRPkHUugrdHrTgk/view?usp=sharing
Run ProtienGym Benchmark, see in this notebook.
OGT prediction, see in this notebook.
Tm prediction, see in this notebook.
Topt prediction, see in this notebook.
T7 prediction, see in this notebook.
TGO_D4K prediction, see in this notebook.
VHH prediction, see in this notebook.
creatinase prediction, see in this notebook.
argonaute prediction, see in this notebook.
This project is under the MIT license. See LICENSE for details.
A lot of code is modified from š¤ transformers and esm.
If you find this repository useful, please consider citing this paper:
@misc{tan2023,
title={Engineering Enhanced Stability and Activity in Proteins through a Novel Temperature-Guided Language Modeling.},
author={Pan Tan and Mingchen Li and Liang Zhang and Zhiqiang Hu and Liang Hong},
year={2023},
eprint={2304.03780},
archivePrefix={arXiv},
primaryClass={q-bio.QM}
}