Genentech / gReLU

gReLU is a python library to train, interpret, and apply deep learning models to DNA sequences.
https://genentech.github.io/gReLU/
MIT License
230 stars 23 forks source link

gReLU

gReLU is a Python library to train, interpret, and apply deep learning models to DNA sequences. Code documentation is available here.

Flowchart

Installation

To install from source:

git clone https://github.com/Genentech/gReLU.git
cd gReLU
pip install .

To install using pip:

pip install gReLU

Typical installation time including all dependencies is under 10 minutes.

To train or use transformer models containing flash attention layers, flash-attn needs to be installed first:

conda install -c conda-forge cudatoolkit-dev -y
pip install torch ninja
pip install flash-attn --no-build-isolation
pip install gReLU

Contributing

This project uses pre-commit. Please make sure to install it before making any changes:

pip install pre-commit
cd gReLU
pre-commit install

It is a good idea to update the hooks to the latest version:

pre-commit autoupdate

Additional requirements

If you want to use genome annotation features through the function grelu.io.genome.read_gtf, you will need to install the following UCSC utilities: genePredToBed, genePredToGtf, bedToGenePred, gtfToGenePred, gff3ToGenePred.

If you want to create bigWig files through the function grelu.data.preprocess.make_insertion_bigwig, you will need to install the following UCSC utilities: bedGraphToBigWig.

UCSC utilities can be installed from http://hgdownload.cse.ucsc.edu/admin/exe/, for example using the following commands:

rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/bedGraphToBigWig /usr/bin/
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/genePredToBed /usr/bin/
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/genePredToGtf /usr/bin/
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/bedToGenePred /usr/bin/
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/gtfToGenePred /usr/bin/
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/gff3ToGenePred /usr/bin/

or via bioconda:

conda install -y \
bioconda::ucsc-bedgraphtobigwig \
bioconda::ucsc-genepredtobed    \
bioconda::ucsc-genepredtogtf    \
bioconda::ucsc-bedtogenepred    \
bioconda::ucsc-gtftogenepred    \
bioconda::ucsc-gff3togenepred

Citation

Please cite our preprint: https://www.biorxiv.org/content/10.1101/2024.09.18.613778v1