SizheQiu / Seq2Topt

A predictor of enzyme optimal temperature
GNU General Public License v3.0
6 stars 1 forks source link

Seq2Topt

A deep learning model of enzyme optimal temperature.

Datasets

sequence_ogt_topt.csv obtained from https://github.com/jafetgado/tomer.
pH opt data obtained from EpHod: https://zenodo.org/records/8011249.
Tm data obtained from https://github.com/liimy1/DeepTM/tree/master/Data.

Accuracy:

  1. Seq2Topt: RMSE = 13.3℃ and R2=0.48
  2. Seq2Tm: RMSE=7.57℃ and R2=0.64
  3. Seq2pHopt: RMSE=0.92 and R2=0.37

    How to use:

  4. Prepare the input file: a CSV file containing a column "sequence" for protein sequences.
  5. Enter /code directory and run prediction:
    python seq2topt.py --input [input.csv] --output [output file name]

    The same for seq2tm.py and seq2pHopt.py in /code.

  6. Hyperparameters and model pth files:
    • Seq2Topt: /data/hyparams/best_topt_param.pkl and /data/model_pth/model_topt_r2test=0.5002.pth
    • Seq2Tm: /data/hyparams/default.pkl and ../data/model_pth/model_Tm_r2=0.682152.pth
    • Seq2pHopt: /data/hyparams/default.pkl and ../data/model_pth/model_pHopt_rmse=0.064849.pth
  7. Feel free to use /code/model.py to develop other predictive models for proteins.

    Workflow:

  8. Model evaluation: /code/Model_evaluation.ipynb
  9. Selection of thermophilic enzymes: /code/CaseStudy_thermophile.ipynb
  10. Analysis of residue attention weights: /code/AnalysisResidueAttention.ipynb
  11. Prediction of optimal temperature shifts: /code/CaseStudy_mutations.ipynb

    Dependency:

    1.Pytorch: https://pytorch.org/
    2.ESM: https://github.com/facebookresearch/esm
    3.Scikit-learn: https://scikit-learn.org/
    4.Seaborn statistical data visualization:https://seaborn.pydata.org/index.html

    Citation

    Qiu, S., Hu, B., Zhao, J., Xu, W., Yang, A. (2024). Seq2Topt: A Sequence-Based Deep Learning Predictor of Enzyme Optimal Temperature. doi:10.1101/2024.08.12.607600