👏 A Survey of Artificial Intelligence in Drug Discovery

💡 Artificial intelligence has been widely applied in drug discovery over the past decade and is still gaining popularity. This repository compiles a collection works on related areas, based on the manuscript Artificial Intelligence in Drug Discovery: Applications and Techniques by Jianyuan Deng et al. The preprint version is available in ResearchGate. Hope you will find it useful for your research (citation is provided below).

🔔 This repository is updated regularly.

@article{deng2022artificial,
  title={Artificial intelligence in drug discovery: applications and techniques},
  author={Deng, Jianyuan and Yang, Zhibo and Ojima, Iwao and Samaras, Dimitris and Wang, Fusheng},
  journal={Briefings in Bioinformatics},
  volume={23},
  number={1},
  pages={bbab430},
  year={2022},
  publisher={Oxford University Press}
}

Reviews and Perspectives
Data, Representation and Benchmarks
- Large-Scale Databases
- PubChem
- ChEMBL
- ZINC
- Others
- Molecular Representations
- Benchmark Platforms
  - MoleculeNet
  - MolMapNet
  - ChemProp
  - REINVENT
  - Guacamol
  - MOSES
  - GraphINVENT
  - ATOM3D
Model Architectures
Learning Paradigms
Addressing Existing Challenges

1. Reviews and Perspectives

1.1 General Drug Discovery

Integration of virtual and high-throughput screening (Nat Rev Drug Discov 2002) [Paper]
Chemical space and biology (Nature 2004) [Paper]
Computer-based de novo design of drug-like molecules (Nat Rev Drug Discov 2005) [Paper]
On Outliers and Activity Cliffs-Why QSAR Often Disappoints (J Chem Inf Model 2006) [Paper]
Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem (J Chem Inf Model 2007) [Paper]
Virtual screening: an endless staircase? (Nat Rev Drug Discov 2010) [Paper]
Privileged Scaffolds for Library Design and Drug Discovery (Curr Opin Chem Biol 2010) [Paper]
Principles of early drug discovery (Br J Pharmacol 2011) [Paper]
Recognizing Pitfalls in Virtual Screening: A Critical Review (J Chem Inf Model 2012) [Paper]
Multi-objective optimization methods in drug design (Drug Discov Today 2013) [Paper]
Finding the rules for successful drug optimisation (Drug Discov Today 2014) [Paper]
Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry (J Med Chem 2014) [Paper]
Automating Drug Discovery (Nat Rev Drug Discov 2017) [Paper]
Interpretation of Quantitative Structure−Activity Relationship Models: Past, Present, and Future (J Chem Inf Model 2017) [Paper]
Advances and Challenges in Computational Target Prediction (J Chem Inf Model 2019) [Paper]
Duality of activity cliffs in drug discovery (Expert Opin Drug Discov 2019) [Paper]
QSAR without borders (Chem Soc Rev 2020) [Paper]
Designing small molecules for therapeutic success: A contemporary perspective (Drug Discov Today 2021) [Paper]
Phenotypic drug discovery: recent successes, lessons learned and new directions (Nat Rev Drug Discov 2022) [Paper]
Is the reductionist paradox an Achilles Heel of drug discovery? (J Comput Aided Mol 2022) [Paper]

1.2 Drug Discovery in the AI Era

Machine-learning approaches in drug discovery: methods and applications (Drug Discov Today 2015) [Paper]
The rise of deep learning in drug discovery (Drug Discov Today 2018) [Paper]
Applications of machine learning in drug discovery and development (Nat Rev Drug Discov 2019)[Paper]
Deep Learning in Chemistry (J Chem Inf Model 2019) [Paper]
Deep learning for molecular design—a review of the state of the art (Mol Syst Des Eng 2019) [Paper]
Efficient molecular encoders for virtual screening (Drug Discov Today Technol 2019) [Paper]
Artificial intelligence in chemistry and drug design (J Comput Aid Mol Des 2020) [Paper]
Graph convolutional networks for computational drug development and discovery (Brief Bioinformatics 2020) [Paper]
Transfer Learning for Drug Discovery (J Med Chem 2020) [Paper]
Learning Molecular Representations for Medicinal Chemistry (J Med Chem 2020) [Paper]
Exploring chemical space using natural language processing methodologies for drug discovery (Drug Discov Today 2020) [Paper]
Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper]
A compact review of molecular property prediction with graph neural networks (Drug Discov Today 2020) [Paper]
Artificial intelligence in drug discovery: Recent advances and future perspectives (Expert Opin Drug Discov 2021) [Paper]
Artificial intelligence in drug discovery and development (Drug Discov Today 2021) [Paper]
Graph neural networks for automated de novo drug design (Drug Discov Today 2021) [Paper]
De novo molecular design and generative models (Drug Discov Today 2021) [Paper]
Artificial Intelligence for Drug Discovery (KDD 2021) [Paper] [Website] [TorchDrug]
Generative Deep Learning for Targeted Compound Design (J Chem Inf Model 2021) [Paper]
Explainable Machine Learning for Property Predictions in Compound Optimization (J Med Chem 2021) [Paper]
A decade of machine learning-based predictive models for human pharmacokinetics: Advances and challenges (Drug Discov Today 2021) [Paper]
Defining Levels of Automated Chemical Design (J Med Chem 2022) [Paper]
Evaluation guidelines for machine learning tools in the chemical sciences (Nat Rev Chem 2022) [Paper]
Combining DELs and machine learning for toxicology prediction (Drug Discov Today 2022) [Paper]

Side Notes: Successful Applications

Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Insilico Medicine)
A Deep Learning Approach to Antibiotic Discovery (Cell 2020) [Paper] [Code] (MIT CSAIL)
"BenevolentAI Announces First Patient Dosed In Its Atopic Dermatitis Clinical Trial" [Link] (BenevolentAI)
"Exscientia Announces First AI-Designed Immuno-Oncology Drug to Enter Clinical Trials" [Link] (Exscientia)
"Breaking Big Pharma's AI barrier: Insilico Medicine uncovers novel target, new drug for pulmonary fibrosis in 18 months" [Link] (Insilico Medicine)

1.3 AI-Driven Drug Discovery: Hope or Hype

Rethinking drug design in the artificial intelligence era (Nat Rev Drug Discov 2020) [Paper]
Towards reproducible computational drug discovery (J Cheminf 2020) [Paper]
Current Trends, Overlooked Issues, and Unmet Challenges in Virtual Screening (J Chem Inf Model 2020) [Paper]
Drug discovery with explainable artificial intelligence (Nat Mach Intell 2020) [Paper]
Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet (Drug Discov Today 2021) [Paper]
Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data used for AI in drug discovery (Drug Discov Today 2021) [Paper]
Critical assessment of AI in drug discovery (Expert Opin Drug Discov 2021) [Paper]
An Insight into Artificial Intelligence in Drug Discovery: An Interview with Professor Gisbert Schneider (Expert Opin Drug Discov 2021) [Paper]

2. Data, Representation & Benchmarks

2.1 Large-Scale Databases

PubChem

PubChem in 2021: new data content and improved web interfaces (Nucleic Acids Res 2021) [Paper] [Website] [Download]

ChEMBL

The ChEMBL database in 2017 (Nucleic Acids Res 2017) [Paper] [Website] [Download] [WebAPI]

ZINC

ZINC 15 – Ligand Discovery for Everyone (J Chem Inf Model 2015) [Paper] [Website]

Others

Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases (Brief Bioinformatics 2019) [Paper]
DrugBank --- DrugBank 5.0: a major update to the DrugBank database for 2018 (Nucleic Acids Res 2018) [Paper] [Website] [Download]
KEGG --- KEGG as a reference resource for gene and protein annotation (Nucleic Acids Res 2016) [Paper] [Website] [Download]
PDBbind --- PDB-wide collection of binding data: current status of the PDBbind database (Bioinformatics 2015) [Paper] [Website] [Download]
BindingDB --- BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology (Nucleic Acids Res 2016) [Paper] [Website] [Download]
DUD --- Benchmarking Sets for Molecular Docking (J Med Chem 2006) [Paper] [Website]
DUD-E --- Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking (J Med Chem 2012) [Paper] [Website]
MUV --- Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data (J Chem Inf Model 2009) [Paper] [Website]
STITCH --- STITCH: interaction networks of chemicals and proteins (Nucleic Acids Res 2008) [Paper] [Website]
GLL&GDD --- Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors (J Chem Inf Model 2012) [Paper] [Website]
NRLiSt BDB --- NRLiSt BDB, the Manually Curated Nuclear Receptors Ligands and Structures Benchmarking Database (J Med Chem 2014) [Paper] [Website]
SIDER --- The SIDER database of drugs and side effects (Nucleic Acids Res 2016) [Paper] [Website]
Offsides&Twosides --- Data-driven prediction of drug effects and interactions (Sci Transl Med 2012) [Paper] [Website]
DILIrank --- DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans (Drug Discov Today 2016) [Paper] [Website]
UniProt --- UniProt: the universal protein knowledgebase in 2021 (Nucleic Acids Res 2021) [Paper] [Website]
PDB --- The Protein Data Bank (Nucleic Acids Res 2000) [Paper] [Website]

2.2 Small Molecule Representations

Molecular representations in AI‑driven drug discovery: a review and practical guide (J Cheminf 2020) [Paper]

2.3 Benchmark Platforms

MoleculeNet

MoleculeNet: a benchmark for molecular machine learning (Chem Sci 2018) [Paper] [Code] [Download]

MolMapNet

Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations (Nat Mach Intell 2021) [Paper] [Code]

ChemProp

Analyzing Learned Molecular Representations for Property Prediction (J Chem Inf Model 2019) [Paper] [Code] [Website]

REINVENT

Molecular De Novo design using Recurrent Neural Networks and Reinforcement Learning (J Cheminf 2017) [Paper] [Code]
REINVENT 2.0 – an AI Tool for De Novo Drug Design (J Chem Inf Model 2020) [Paper] [Code]

GraphINVENT

Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper] [Code]

Guacamol

GuacaMol: Benchmarking Models for de Novo Molecular Design (J Chem Inf Model 2019) [Paper] [Code]

MOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models (Front Pharmacol 2020) [Paper] [Code]

ATOM3D

ATOM3D: Tasks On Molecules in Three Dimensions (NeurIPS 2021) [Paper] [Code] [Website]

3. Model Architectures

3.1 Convolutional Neural Networks

Task: Molecular Property Prediction; Representation: Images

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper] (Techs - CNN + SVM)
Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models (aka: Chemception; arXiv 2017) [Paper]
Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images (aka: Toxic Colors; J Chem Inf Model 2018) [Paper]
KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images (J Cheminf 2019) [Paper] [Code] (Techs - CNN)
Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests (J Chem Inf Model 2019) [Paper] (Techs - CNN)
DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representation (Chem Sci 2020) [Paper] [Code]

Task: Molecular Property Prediction; Representation: Fingerprints

Massively Multitask Networks for Drug Discovery (arXiv 2015) [Paper]
Convolutional Networks on Graphs for Learning Molecular Fingerprints (NeurIPS 2015) [Paper] [Code]

Side Note: Molecular Structure Extraction and Recognition

Molecular Structure Extraction from Documents Using Deep Learning (J Chem Inf Model 2019) [Paper]
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature (J Cheminf 2021) [Paper]
DECIMER: towards deep learning for chemical image recognition (J Cheminf 2020) [Paper] [Code]
DECIMER 1.0: Deep Learning for Chemical Image Recognition using Transformers (chemRxiv 2021) [Paper]
Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions (Chem Sci 2021) [Paper] [Code]

3.2 Recurrent Neural Networks

Task: Molecular Property Prediction; Representation: SMILES Strings

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties (aka: SMILES2Vec; arXiv 2017) [Paper]
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)

Task: Molecule Generation; Representation: SMILES Strings

Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks (aka: CharRNN; ACS Cent Sci 2018) [Paper] (Techs - Transfer Learning)
Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
Deep Reinforcement Learning for de novo Drug Design（aka: ReLeaSE; Sci Adv 2018）[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
Scaffold-Constrained Molecular Generation (J Chem Inf Model 2020) [Paper] (Techs - RL: Policy-based Hill Climbing)

Task: Molecule Generation; Representation: Molecular Graphs

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models (aka: GraphRNN; ICML 2018) [Paper] [Code]
Learning Deep Generative Models of Graphs (ICML 2018) [Paper] [Code] (Techs - RNN: LSTM)
MolecularRNN: Generating realistic molecular graphs with optimized properties (arXiv 2019) [Paper]
A Deep-Learning View of Chemical Space Designed to Facilitate Drug Discovery (aka: DESMILES; J Chem Inf Model 2020) [Paper]

3.3 Graph Neural Networks

Task: Molecular Property Prediction; Representation: Molecular Graphs

Molecular Graph Convolutions: Moving Beyond Fingerprints (aka: Weave; J Comput Aided Mol Des 2016) [Paper]
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper]
Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
Neural Message Passing for Quantum Chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions (aka: SchNet; NeurIPS 2017)[Paper] [Code]
Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)
PotentialNet for Molecular Property Prediction (aka: PotentialNet; ACS Cent Sci 2018) [Paper] (Techs - GNN: GCNN + Multi-Task Learning)
Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective (aka: MGCN; AAAI 2019) [Paper]
Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity (J Chem Inf Model 2019) [Paper] [Code] (Techs - GCN + Multi-task Learning)
DeepChemStable: Chemical Stability Prediction with an Attention-Based Graph Convolution Network (J Chem Inf Model 2019) [Paper] (Techs - GCN + Attention)
Analyzing Learned Molecular Representations for Property Prediction (aka: Chemrop, D-MPNN; J Chem Inf Model 2019) [Paper] [Code]
Molecule Property Prediction Based on Spatial Graph Embedding (aka: C-SGEN; J Chem Inf Model 2019) [Paper] [Code]
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism (aka: Attentive FP; J Med Chem 2019) [Paper] [Code]
Graph convolutional neural networks as” general-purpose” property predictors: the universality and limits of applicability (J Chem Inf Model 2020) [Paper]
N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules (aka: N-Gram Graph; NeurIPS 2019) [Paper]
Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Multi-Task Learning)
A self‑attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Self-Attention: Interpretability)
Chemically Interpretable Graph Interaction Network for Prediction of Pharmacokinetic Properties of Drug-Like Molecules (aka: CIGIN; AAAI 2020) [Paper] [Code]
Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
Directional Message Passing for Molecular Graphs (aka: DimeNet; ICLR 2020) [Paper] [Code]
Drug–target affinity prediction using graph neural network and contact maps (RSC Advances 2020) [Paper] (Techs - GCN + GAT)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - GGNN + Meta Learning: MAML, FO-MAML, ANIL)

Task: Molecule Generation; Representation: Molecular Graphs

Multi‑objective de novo drug design with conditional graph generative model (J Cheminf 2018) [Paper] [Code] (Techs - Conditional Graph Generative Model: MolMP, MolRNN)
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018) [Paper] [Code] (Techs - GCN + RL: PPO)
Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Q-learning)
Improving Molecular Design by Stochastic Iterative Target Augmentation (ICML 2020) [Paper] [Code] (Techs - VSeq2Seq/HierGNN + Semi-Supervised Learnning)
DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (J Cheminf 2020) [Paper] (Techs - GCN + RL: PPO)
Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: PPO)
Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
De novo drug design using reinforcement learning with graph-based deep generative models (aka: RL-GraphINVENT; ChemRxiv 2021) [Paper] [Code]

Side Note: Common GNN Models

Recurrent GNNs Gated graph sequence neural networks (aka: GGNN; ICLR 2016) [Paper] [Code]
Convolutional GNNs (Spectral-based) Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (aka: ChebNet; NeurIPS 2016) [Paper] [Code]
Convolutional GNNs (Spectral-based) Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
Convolutional GNNs (Spatial-based) Neural message passing for quantum chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
Convolutional GNNs (Spatial-based) Inductive Representation Learning on Large Graphs (aka: GraphSAGE; NeurIPS 2017) [Paper] [Code]
Convolutional GNNs (Spatial-based) Graph Attention Networks (aka: GAT; ICLR 2018) [Paper] [Code]
Convolutional GNNs (Spatial-based) How powerful are graph neural networks? (aka: GIN; ICLR 2019) [Paper] [Code]

3.4 Variational Autoencoders

Task: Molecule Generation; Representation: SMILES Strings

Automatic chemical design using a data-driven continuous representation of molecules (arXiv 2016; ACS Cent Sci 2018) [Paper] [Code] (Techs - VAE)
Grammar Variational Autoencoder (aka: GrammarVAE; ICML 2017) [Paper]
Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
Syntax-Directed Variational Autoencoder for Structured Data (aka: SD-VAE; ICLR 2018) [Paper] [Code]
Conditional Molecular Design with Deep Generative Models （aka: Continuous SSVAE; J Chem Inf Model 2018）[Paper] [Code]
Molecular generative model based on conditional variational autoencoder for de novo molecular design (aka: CVAE; J Cheminf 2018) [Paper] [Code] (Techs - VAE)
Constrained Graph Variational Autoencoders for Molecule Design (aka: CGVAE; NeurIPS 2018) [Paper] [Code]
NEVAE: A Deep Generative Model for Molecular Graphs (aka: NeVAE; AAAI 2019) [Paper] [Code]
De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping (aka: GTMVAE; J Chem Inf Model 2019) [Paper] (Techs - Autoencoder + RNN)
Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation (aka: re-balanced VAE; ACM BCB 2020) [Paper] [Code] (Techs - RNN: BiGRU + VAE)
CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models (aka: CogMol; NeurIPS 2020) [Paper] [Code]

VAE Variant: AAE

Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico (aka: druGAN; Mol Pharm 2017) [Paper]
Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery (aka: SAAE; Mol Pharm 2018) [Paper]

Task: Molecule Generation; Representation: Molecular Graphs

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders (aka: GraphVAE; arXiv 2018) [Paper]
Junction Tree Variational Autoencoder for Molecular Graph Generation (aka: JT-VAE; ICML 2018) [Paper] [Code]
Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders (aka:Regularized VAE; NeurIPS 2018)[Paper]
Molecular Hypergraph Grammar with Its Application to Molecular Optimization (aka: MHG-VAE; ICML 2019) [Paper] [Code]
Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL)
Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: REINFORCE)
Scaffold-based molecular design using graph generative model (aka: ScaffoldVAE; arXiv 2019) [Paper]
Learning Multimodal Graph-to-Graph Translation for Molecule Optimization (aka: VJTNN; ICLR 2019) [Paper] [Code]
CORE: Automatic Molecule Optimization Using Copy & Refine Strategy (AAAI 2020) [Paper] [Code]
Hierarchical Generation of Molecular Graphs using Structural Motifs (aka: HierVAE; ICML 2020) [Paper] [Code] (Techs - Hierarchical VAE)
Compressed graph representation for scalable molecular graph generation (J Cheminf 2020) [Paper] [Code] (Techs - Non-autoregressive VAE)

Side Note: Reaction & Retrosynthesis Prediction; Representation*: Molecular Graphs

Generating Molecules via Chemical Reactions (ICLR 2019 Workshop) [Paper]
Barking up the right tree: an approach to search over molecule synthesis DAG (NeurIPS 2020) [Paper] [Code]

3.5 Generative Adversarial Networks

Task: Molecule Generation; Representation: SMILES Strings

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: REINFORCE)
Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: REINFORCE)
Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)

Task: Molecule Generation; Representation: Molecular Graphs

MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: DDPG)

3.6 Normalizing Flow Models

Task: Molecule Generation; Representation: Molecular Graphs

GraphNVP: An Invertible Flow Model for Generating Molecular Graphs (aka: GraphNVP; arXiv 2019) [Paper] [Code]
Graph Residual Flow for Molecular Graph Generation (aka: GRF; arXiv 2019) [Paper]
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: PPO)
MoFlow: An Invertible Flow Model for Generating Molecular Graphs (aka: MoFlow; KDD 2020) [Paper] [Code]
GraphDF: A Discrete Flow Model for Molecular Graph Generation (aka: GraphDF; ICML 2021) [Paper]
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (NeurIPS 2021) [Paper]

3.7 Transformers

Task: Molecular Property Prediction; Representation: SMILES Strings

SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (aka: SMILES-BERT; ACM BCB 2019) [Paper] (Techs - BERT)
SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery (aka: SMILES Transformer; arXiv 2019) [Paper] [Code]
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction (aka: ChemBERTa; arXiv 2020) [Paper] [Code]
Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
Algebraic graph-assisted bidirectional transformers for molecular property prediction (aka: AGBT; Nat Commun 2021) [Paper] [Code]
ChemBERTa-2: Towards Chemical Foundation Models (aka: ChemBERTa-2; arXiv 2022) [Paper]

Task: Molecular Property Prediction; Representation: Molecular Graphs

Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)

Task: Molecule Generation; Representation: SMILES Strings

Transformer-Based Generative Model Accelerating the Development of Novel BRAF Inhibitors (ACS Omega 2021) [Paper]
MolGPT: Molecular Generation Using a Transformer-Decoder Model (aka: MolGPT; J Chem Inf Model 2022) [Paper] [Code]

Task: Molecule Generation; Representation: Molecular Graphs

A Model to Search for Synthesizable Molecules (aka: Molecule Chef; NeurIPS 2019) [Paper] [Code]
Transformer neural network for protein-specific de novo drug generation as a machine translation problem (Sci Rep 2021) [Paper]

4. Learning Paradigms

4.1 Self-Supervised Learning in Molecular Property Prediction

Generative Learning

Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)

Contrastive Learning

MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (ArXiv 2021) [Paper] [Code]
Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast (J Chem Inf Model 2022) [Paper] [Code]

4.2 Reinforcement Learning in Molecule Generation

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: Policy-gradient REINFORCE)
Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: Policy-gradient REINFORCE)
Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)
Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: Hybrid Actor-Critic DDPG)
Deep Reinforcement Learning for de novo Drug Design（aka: ReLeaSE; Sci Adv 2018）[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018)[Paper] [Code] (Techs - GCN + RL: Policy-gradient PPO)
Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: Policy-gradient REINFORCE)
Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (aka: DeepFMPO; J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Value-based Double Q-learning)
Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL: Policy-gradient)
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: Policy-gradient PPO)
Reinforcement Learning for Molecular Design Guided by Quantum Mechanics (cka: MolGym; ICML 2020) [Paper] (Techs - RL: Policy-gradient PPO)
DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (aka: DeepGraphMolGen; J Cheminf 2020) [Paper] (Techs - GCN + RL: Policy-gradient PPO)
Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: Policy-gradient PPO)
Deep inverse reinforcement learning for structural evolution of small molecules (Brief Bioinform 2021) [Paper] [Code] (Techs - Inverse RL)

Side Note: Common RL Algorithms

Value-based Playing Atari with Deep Reinforcement Learning (aka: DQN; NeurIPS Workshop 2013) [Paper]
Value-based Human-level control through deep reinforcement learning (aka: DQN; Nature 2015) [Paper]
Value-based Deep Reinforcement Learning with Double Q-learning (aka: Double Q-learning; AAAI 2016) [Paper]
Value-based Prioritized Experience Replay (aka: DQN with Experience Replay; ICLR 2016) [Paper]
Value-based Dueling Network Architectures for Deep Reinforcement Learning (aka: Dueling Network; ICML 2016) [Paper]
Policy-gradient Simple statistical gradient-following algorithms for connectionist reinforcement learning (aka: REINFORCE; Mach Learn 1992) [Paper]
Policy-gradient Policy Gradient Methods for Reinforcement Learning with Function Approximation (aka: Random Policy Gradient; NeurIPS 1999) [Paper]
Policy-gradient Deterministic Policy Gradient Algorithms (aka: DPG; ICML 2014) [Paper]
Policy-gradient Trust Region Policy Optimization (aka: TRPO; ICML 2015) [Paper]
Policy-gradient Proximal Policy Optimization Algorithms (aka: PPO; arXiv 2017 2015) [Paper]
Hybrid Continuous control with deep reinforcement learning (aka: DDPG; ICLR 2016) [Paper]
Hybrid Asynchronous Methods for Deep Reinforcement Learning (aka: A3C; ICML 2016) [Paper]

Side Note: Pareto Optimality

De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization (J Chem Inf Model 2020) [Paper] [Code] (Techs - Pareto Optimality)
Multiobjective de novo drug design with recurrent neural networks and nondominated sorting (J Cheminf 2020) [Paper] (Techs - Pareto Optimality)
DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology (ChemRxiv) [Paper] (Techs - Pareto Optimality)

Side Note: Reaction & Retrosynthesis Optimization

Optimizing chemical reactions with deep reinforcement learning (ACS Cent Sci 2017) [Paper]

4.4 Other Learning Paradigms

Metric Learning

Machine-guided representation for accurate graph-based molecular machine learning (Phys Chem Chem Phys 2020) [Paper]
Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning (Mol Inform 2020) [Paper]

Few-Shot Learning

Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
Few-Shot Graph Learning for Molecular Property Prediction (WWW 2021) [Paper] [Code]

Meta Learning

Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - Gated GNN + Meta Learning: MAML, FO-MAML, ANIL)

Active Learning

ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
Evidential Deep Learning for Guided Molecular Property Prediction and Discovery (NeurIPS 2020 Workshop) [Talk]
Batched Bayesian Optimization for Drug Design in Noisy Environments (J Chem Inf Model 2022) [Paper] [Code]

5. Addressing Existing Challenges

Model Interpretation

Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome−Inhibitor Interaction Landscapes (J Chem Inf Model 2018) [Paper]
Using attribution to decode binding mechanism in neural network models for chemistry (PNAS 2019) [Paper]
Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? (J Chem Inf Model 2019) [Paper]
Building of Robust and Interpretable QSAR Classification Models by Means of the Rivality Index (J Chem Inf Model 2019) [Paper]

Dataset Concerns

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening (J Chem Inf Model 2019) [Paper]
Deep Learning-Based Imbalanced Data Classification for Drug Discovery (J Chem Inf Model 2020) [Paper] [Code]

Uncertainty Estimation

General Approach to Estimate Error Bars for Quantitative Structure−Activity Relationship Predictions of Molecular Activity (J Chem Inf Model 2018) [Paper]
Assessment and Reproducibility of Quantitative Structure−Activity Relationship Models by the Nonexpert (J Chem Inf Model 2018) [Paper]
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks (J Chem Inf Model 2018) [Paper]
Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout (J Chem Inf Model 2019) [Paper]
Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks (Nat Mach Intell 2020) [Paper]
Assigning Confidence to Molecular Property Prediction (arXiv 2021) [Paper]
Gi and Pal Scores: Deep Neural Network Generalization Statistics (ICLR 2021 Workshop) [Paper]
Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty (J Cheminf 2021) [Paper]

Representation Capacity

Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure (J Chem Inf Model 2019) [Paper]
Optimal Transport Graph Neural Networks (arXiv 2020) [Paper]

Out-of-Distribution Generalization

Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure−Activity Relationship Models Based on Deep Neural Networks? (J Chem Inf Model 2018) [Paper]
Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization (J Chem Inf Model 2018) [Paper]
Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds (J Chem Inf Model 2018) [Paper]

Threshold Adjustment

GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning (J Chem Inf Model 2021) [Paper] [Code]

Model Comparison

Validating the validation: reanalyzing a large‑scale comparison of deep learning and machine learning models for bioactivity prediction (J Comput Aided Mol Des 2020) [Paper]
Comparing classification models-a practical tutorial (J Comput Aided Mol Des 2021) [Paper]

Model Adoption

A Turing Test for Molecular Generators (J Med Chem 2020) [Paper]

Molecular Docking

Docking and scoring in virtual screening for drug discovery: methods and applications (Nat Rev Drug Discov 2004) [Paper]
Benchmarking sets for molecular docking (J Med Chem 2006) [Paper]
Molecular Docking: A powerful approach for structure-based drug discovery (Curr Comput Aided Drug Des 2011) [Paper]
Software for Molecular Docking: a review (Biophys Rev 2017) [Paper]
Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery (ACS Cent Sci 2020) [Paper]
A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection (Molecules 2020) [Paper]
GNINA 1.0: molecular docking with deep learning (J Cheminf 2021) [Paper]

Molecular Fragmentation & Assembly

Molecular generation by Fast Assembly of (Deep)SMILES fragments (J Cheminf 2021) [Paper] [Code]

dengjianyuan / Survey_AI_Drug_Discovery

readme

👏 A Survey of Artificial Intelligence in Drug Discovery

Contents

1. Reviews and Perspectives

2. Data, Representation & Benchmarks

3. Model Architectures

4. Learning Paradigms

5. Addressing Existing Challenges