StellaHxy / EMNgly

4 stars 0 forks source link

EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction

Xiaoyang Hou, Yu Wang, Dongbo Bu, Yaojun Wang and Shiwei Sun
Paper link on Bioinformatics

Environments

git clone git@github.com:StellaHxy/EMNgly.git
conda create -n EMNgly python=3.7
conda activate EMNgly
pip install -r requirements.txt

Datasets

Usage

Quick predict N-linked glycosylation sites of N-GlycositeAltas dataset:

Download the checkpoint of N-GlyAltas_classifier under folder checkpoints/:

├── checkpoints
│       └── N-GlyAltas_classifier.pkl
├── data
├── log
├── model
├── scripts
├── main.py
├── predict.py
python predict.py --mode=test_features --data_path=./data/N-GlycositeAltas --ckpt_path=./checkpoints/N-GlyAltas_classifier.pkl 

Train the classifier of EMNgly on N-GlycositeAltas dataset:

bash scritps/get_N-GlycositeAltas_train_features.sh
python main.py --mode=train --data_path=./data/N-GlycositeAltas --output_path=./checkpoints/N-GlyAltas_classifier.pkl

Predict N-linked glycosylation sites of N-GlycositeAltas dataset:

bash scritps/get_N-GlycositeAltas_test_features.sh
python predict.py --mode=test --data_path=./data/N-GlycositeAltas --ckpt_path=./checkpoints/N-GlyAltas_classifier.pkl