http://kcdukkalab.org/LMOGlcNAcSite/
Suresh Pokharel1, Pawel Pratyush1, Hamid D. Ismail1, Junfeng Ma2, Dukka B KC1
1Department of Computer Science, Michigan Technological University, Houghton, MI, USA
2
Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Georgetown University, Washington, DC 20057, USA
Corresponding Author: dbkc@mtu.edu
If Git is installed on your system, clone the repository by running the following command in your terminal:
git clone git@github.com:KCLabMTU/LM-OGlcNAc-Site.git
If you do not have Git or perfer to download directly: Download the repository directly from GitHub. Click Here to download the repository as a zip file.
Python version: 3.10.0
To intall the required libraries, run the following command:
pip install -r requirements.txt
Required libraries and versions:
ankh==1.10.0
Bio==1.7.0
biopython==1.83
datasets==2.19.0
fair_esm==2.0.0
keras==2.8.0
numpy==1.26.4
pandas==2.2.2
protobuf==3.20.*
scikit_learn==1.4.2
scipy==1.13.0
tensorflow==2.8.0
torch==2.3.0
tqdm==4.66.2
transformers==4.40.1
LM-OGlcNAc-Site
model on your own sequencesIn order to predict succinylation site using your own sequence, you need to have two inputs:
input/sequence.fasta
python predict.py
output
folder in a csv file named results.csv
Use the following command to determine input and output files:
python predict.py --input [input_path] --output [output_path]
or in short form notation,
python predict.py -i [input_path] --output [output_path]
Replace:
[Input]
with the path of the input file you want to run the model onto MUST BE a .fasta FILE
[Output]
with the path of the output file you want the result to be returned to MUST BE A .csv FILE
Example:
python predict.py -i input.fasta -o output.csv
Note:
- You an always use the '-h' or '--help' flag to get detailed information about the available command-line arguments.
- You may also utilize the web server [here] (http://kcdukkalab.org/LMOGlcNAcSite/)
Citation
Pokharel, S.; Pratyush, P.; Ismail, H.D.; Ma, J.; KC, D.B. Integrating Embeddings from Multiple Protein Language Models to Improve Protein O-GlcNAc Site Prediction. Int. J. Mol. Sci. 2023, 24, 16000. https://doi.org/10.3390/ijms242116000
Paper Link: https://www.mdpi.com/1422-0067/24/21/16000
Please send an email to sureshp@mtu.edu (CC: dbkc@mtu.edu, ppratyus@mtu.edu for any kind of queries and discussions.