Westlake-AI / OpenBioSeq

This repo focuses on supervised and self-supervised bio-sequence representation learning
https://openbioseq.readthedocs.io/en/latest/
Apache License 2.0
20 stars 1 forks source link
bio-sequences classification-regression genetic-engineering pytorch self-supervised transformer

OpenBioSeq

PyPI license open issues issue resolution

News

Introduction

The main branch works with PyTorch 1.8 (required by some self-supervised methods) or higher (we recommend PyTorch 1.12). You can still use PyTorch 1.6 for most methods.

OpenBioSeq is an open-source supervised and self-supervised bio-sequence representation learning toolbox based on PyTorch. OpenBioSeq supports popular backbones, pre-training methods, and various features.

What does this repo do?

Learning useful bio-sequence representation efficiently facilitates various downstream tasks in biological and chemical fields. This repo focuses on supervised and self-supervised bio-sequence representation learning and is named OpenBioSeq.

Major features

This repo will be continued to update in 2022! Please watch us for latest update!

Change Log

Please refer to CHANGELOG.md for details and release history.

[2022-06-09] OpenBioSeq v0.1.1 is released.

[2022-05-24] OpenBioSeq v0.1.0 is initialized.

Installation

There are quick installation steps for develepment:

conda create -n openbioseq python=3.8 -y
conda activate openbioseq
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 # as an example
pip install openmim
mim install mmcv-full
git clone https://github.com/Westlake-AI/OpenBioSeq.git
cd OpenBioSeq
python setup.py develop

Please refer to INSTALL.md for detailed installation instructions and dataset preparation.

Get Started

Please see Getting Started for the basic usage of OpenBioSeq (based on OpenMixup and MMSelfSup). As an example, you can start a multiple GPUs training with a certain CONFIG_FILE using the following script:

bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]

Then, please see tutorials for more tech details (based on MMClassification).

License

This project is released under the Apache 2.0 license.

Acknowledgement

Citation

If you find this project useful in your research, please consider cite:

@misc{2022openbioseq,
    title={{OpenBioSeq}: Open Toolbox and Benchmark for Bio-sequence Representation Learning},
    author={Li, Siyuan and Liu, Zicheng and Wu, Di and Stan Z. Li},
    howpublished = {\url{https://github.com/Westlake-AI/openbioseq}},
    year={2022}
}

Contributors

For now, the direct contributors include: Siyuan Li (@Lupin1998) and Zicheng Liu (@pone7). We thanks contributors for OpenMixup, MMSelfSup, and MMClassification.

Contact

This repo is currently maintained by Siyuan Li (lisiyuan@westlake.edu.cn) and Zicheng Liu (liuzicheng@westlake.edu.cn).