CDDLeiden / DrugEx

De Novo Drug Design with RNNs and Transformers
https://cddleiden.github.io/DrugEx/docs/
MIT License
119 stars 16 forks source link
cheminformatics de-novo-drug-design deep-learning drug-discovery reinforcement-learning

DrugEx

DrugEx is an open-source software library for de novo design of small molecules with deep learning generative models in a multi-objective reinforcement learning framework. The package contains multiple generator architectures and a variety of scoring tools and multi-objective optimisation methods. It has a flexible application programming interface and can readily be used via the command line interface [[1](https://pubs.acs.org/doi/10.1021/acs.jcim.3c00434)] (see [Quick Start](#quick-start) to get to work right away). ## History This software is a continuation of the original and incremental work of Liu et al.'s DrugEx [[2](https://doi.org/10.1186/s13321-019-0355-6),[3](https://doi.org/10.1186/s13321-021-00561-9),[4](https://doi.org/10.1186/s13321-023-00694-z)] and is currently developed by [Gerard van Westen's Computational Drug Discovery](https://twitter.com/cddleiden) group in Leiden, Netherlands. The first version of DrugEx [[2](https://doi.org/10.1186/s13321-019-0355-6)] consisted of a recurrent neural network (RNN) single-task agent of gated recurrent units (GRU) which were updated to long short-term memory (LSTM) units in the second version [[3](https://doi.org/10.1186/s13321-021-00561-9)], also introducing MOO-based RL and an updated exploitation-exploration strategy. In its third version, [[4](https://doi.org/10.1186/s13321-023-00694-z)] generators based on a variant of the transformer and a novel graph-based encoding allowing for the sampling of molecules with specific substructures were introduced. This package builds on these works and provides a unified API with increased usability and flexibile enough for customization. However, new additional features are beeing added as well [[1](https://pubs.acs.org/doi/10.1021/acs.jcim.3c00434)]. Furthermore, the development and traning of QSAR models, used to score molecules during reinforcement learning has been moved to a separate [QSPRpred](https://github.com/CDDLeiden/QSPRPred)-package, which became a useful library in its own right. ## Workflow The DrugEx package provides classes to standardize, clean and encode molecules for the various deep learning algorithms provided in the package as well as features to set up and monitor training and optimization. The resulting models can be used readily for generation of focused libraries and are easily transferable. ![Fig1](figures/TOC_figure.png) # Quick Start > A small step for exploring the drug space in need, a giant leap for exploiting a healthy state indeed. ## Installation DrugEx can be installed with pip like so: ```bash pip install git+https://github.com/CDDLeiden/DrugEx.git@master ``` ### Optional Dependencies **[QSPRPred](https://github.com/CDDLeiden/QSPRPred.git)** - Optional package to install if you want to use the command line interface of DrugEx, which requires the models to be serialized with this package. It is also used by some examples in the tutorial. Install DrugEx with the following command if you want these features: ```bash pip install "drugex[qsprpred] @ git+https://github.com/CDDLeiden/DrugEx.git@master" ``` **[RAscore](https://github.com/reymond-group/RAscore)** - If you want to use the Retrosynthesis Accessibility Score in the desirability function. - The installation of RAscore might downgrade the scikit-Learn packages. If this happens, scikit-Learn should be re-upgraded. ## Use After installation, you will have access to various command line features, but you can also use the Python API directly. Documentation for the current version of both is available [here](https://cddleiden.github.io/DrugEx/docs/). For a quick start, you can also check out our [Jupyter notebook tutorial](./tutorial), which documents the use of the Python API to build different types of models, or take look at the [CLI examples](https://cddleiden.github.io/DrugEx/docs/use.html#cli-example). The tutorials as well as the documentation are still work in progress, and we will be happy for any contributions where it is still lacking. This repository contains almost all models implemented throughout DrugEx history. We also make the following pretrained models available to be used with this package. You can retrieve them from the following table (not all models are available at this moment, but we will keep adding them):

Model RNN SMILES-Based Transformer Graph-Based Transformer
type fragmentation
GRU LSTM BRICS RECAP BRICS RECAP
ChEMBL 27 - Zenodo - - Zenodo -
ChEMBL 31 Zenodo Zenodo - - Zenodo -
Papyrus 05.5 Zenodo Zenodo Zenodo Zenodo Zenodo Zenodo
### Hardware Requirements The DrugEx toolkit offers a variety of models with varying complexities, each with its unique hardware requirements. In order to enable the full suite of models, the user must have a GPU compatible with CUDA 9.2, with a minimum of 8 GB of video memory. This is to facilitate that the models can be transferred to the GPU along with sufficiently large training batches. It is noteworthy, however, that even on a suboptimal configuration, it should be possible to fine-tune and optimize the basic sequential RNN model using reinforcement learning techniques if a pretrained model is used. Regarding the two transformers, we recommend leveraging multiple GPUs to increase throughput via parallelization, automated by the DrugEx package. This technique divides the model's workload across multiple GPUs, enabling the system to handle more significant volumes of data at a faster rate than when using a single GPU. # License The software is licensed under the standard MIT license, which means it is free to use also in commercial applications as long as the copyright terms of the license are preserved. You can view the [LICENSE](./LICENSE) file for the full terms. If you have questions about the license or the use of the software in your organization, please, contact Gerard J.P. van Westen: [Gerard J.P. van Westen](mailto:gerard@lacdr.leidenuniv.nl): gerard@lacdr.leidenuniv.nl # Current Development Team - [M. Sicho](https://github.com/martin-sicho) - [S. Luukkonen](https://github.com/sohviluukkonen) - [H. van den Maagdenberg](https://github.com/HellevdM) - [L. Schoenmaker](https://github.com/LindeSchoenmaker) - [O. Béquignon](https://github.com/OlivierBeq) - [R.L. van den Broek](https://github.com/rlvandenbroek) # Contributions If you find that there is something missing, have a question, or you just want to contribute a new model or feature, please, feel free to open an issue to initiate a discussion. We are more than happy to improve the package with your contributions, bug reports and ideas. After the feature is discussed in its designated issue, the best way to contribute is to fork the repository, make your changes and then create a pull request. We will then review your changes and merge them into the main repository. Alternatively, you can contact us directly via [email](mailto:cdd@lacdr.leidenuniv.nl). # Acknowledgements We would like to thank the following people for significant contributions: - [Xuhan Liu](https://github.com/XuhanLiu) - author of the original idea to develop the DrugEx models and code, we are happy for his continuous support of the project We also thank the following Git repositories that gave Xuhan a lot of inspirations: 1. [REINVENT](https://github.com/MarcusOlivecrona/REINVENT) 2. [ORGAN](https://github.com/gablg1/ORGAN) 3. [SeqGAN](https://github.com/LantaoYu/SeqGAN) # References [1] [Sicho M., Luukkonen S., van den Maagdenberg H.W., Schoenmaker L., Béquignon O.J.M., van Westen G.J.P. DrugEx: Deep Learning Models and Tools for Exploration of Drug-like Chemical Space. *J. Chem. Inf. Model.*, **2023**, 63, 12.](https://pubs.acs.org/doi/10.1021/acs.jcim.3c00434) [2] [Liu X., Ye K., van Vlijmen H.W.T, IJzerman A.P., van Westen G.J.P. An Exploration Strategy Improves the Diversity of de novo Ligands Using Deep Reinforcement Learning: a case for the adenosine A2A receptor. *J Cheminform.*, **2019**, 11, 35.](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0355-6) [3] [Liu X, Ye K, van Vlijmen H.W.T, Emmerich M.T.M., IJzerman AP, van Westen G.J.P. DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology. *J Cheminform.*, **2021**, 13, 85.](https://doi.org/10.1186/s13321-021-00561-9) [4] [Liu, X., Ye, K., van Vlijmen, H.W.T. IJzerman A.P., van Westen G.J.P. DrugEx v3: Scaffold-Constrained Drug Design with Graph Transformer-based Reinforcement Learning. *J Cheminform.*, **2023**, 15, 24.](https://doi.org/10.1186/s13321-023-00694-z)