Shujun-He / RibonanzaNet

MIT License
10 stars 5 forks source link

RibonanzaNet

Training code for RibonanzaNet.

Example notebooks

You may not want to retrain RibonanzaNet from scratch and rather just use pretrained checkpoints, so we have created example notebooks: \ finetune: https://www.kaggle.com/code/shujun717/ribonanzanet-2d-structure-finetune \ secondary structure inference: https://www.kaggle.com/code/shujun717/ribonanzanet-2d-structure-inference \ chemical mapping inference: https://www.kaggle.com/code/shujun717/ribonanzanet-inference

Data Download

You just need train_data.csv, test_sequences.csv, and sample_submission.csv from https://www.kaggle.com/competitions/stanford-ribonanza-rna-folding/data

Environment

Create the environment from the environment file env.yml

conda env create -f env.yml

Install ranger optimizer

conda activate torch

git clone https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
cd Ranger-Deep-Learning-Optimizer
pip install -e .

How to run

First activate environment conda activate torch

Set up accelerate with accelerate config in the terminal or with --config_path option

For an example of a accelerate config file, see accelerate_config.yaml

Training

accelerate launch run.py --config_path configs/pairwise.yaml

Inference

accelerate launch inference.py --config_path configs/pairwise.yaml

Process raw prediction into submission file for Ribonanza

python make_submission.py --config_path configs/pairwise.yaml

Configuration File

This section explains the various parameters and settings in the configuration file for RibonanzaNet

Model Hyperparameters

Data Scaling

Other Configurations


File structure

logs has the csv log file with train/val oss, models has model weights and optimizer states, oofs has the val predictions