This repository contains codes and pre-trained models for RNA feature extraction and secondary structure prediction model (ERNIE-RNA).
ERNIE-RNA is superior to the tested RNA feature extraction models (including RNA-FM) in the feature extraction task, and its effect in the secondary structure prediction task is better than RNAfold, UNI-RNA and others.
You can find more details about ERNIE-RNA in our paper, ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations
First, download the repository and create the environment.
git clone https://github.com/Bruce-ywj/ERNIE-RNA.git
cd ./ERNIE-RNA
conda env create -f environment.yml
Then, activate the "ERNIE-RNA" environment.
conda activate ERNIE-RNA
There are two subfolders in the model folder, each folder has a link, and you can download the model in the link to the same directory. Or you can download both models from our drive
python extract_embedding.py --seqs_path='./data/test_seqs.txt' --device='cuda:0'
The model path parameters are set by default and do not need to be changed.
The corresponding feature extraction code is inside this file, and the sequence in the file can be modified when used.
In this file, you can use ERNIE-RNA (twod_mlm) for feature extraction.
Features include cls, tokens, atten_map.
python predict_ss_rna.py --seqs_path='./data/test_seqs.fasta' --device='cuda:0'
The model path parameters are set by default and do not need to be changed.
The corresponding feature extraction code is inside this file, and the sequence in the file can be modified when used.
This file will output the RNA secondary structure predicted by the two models (fine-tuned model and pre-trained model).
If you find the models useful in your research, please cite our work:
ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations
Yin W, Zhang Z, He L, et al. ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations[J]. bioRxiv, 2024: 2024.03. 17.585376.
We use fairseq sequence modeling framework to train our RNA language modeling. We very appreciate this excellent work!
This source code is licensed under the MIT license found in the LICENSE
file
in the root directory of this source tree.