This repository contains the code for generating/optimizing conformers, extracting interaction profiles, aligning interaction profiles, and differentiably scoring 3D similarity. It also contains modules to evaluate conformers generated with ShEPhERD1 and other generative models.
The formulation of the interaction profile representation, scoring, alignment, and evaluations are found in our preprint ShEPhERD: Diffusion shape, electrostatics, and pharmacophores for bioisosteric drug design. The diffusion model itself is found in a separate repository: https://github.com/coleygroup/shepherd.
1 ShEPhERD: Shape, Electrostatics, and Pharmacophores Explicit Representation Diffusion
.
├── shepherd_score/
│ ├── alignment_utils/ # Alignment and rigid transformations tools
│ │ ├── pca.py
│ │ └── se3.py
│ ├── evaluations/ # Evaluation suite
│ │ ├── pdbs/
│ │ ├── utils/
│ │ │ ├── convert_data.py
│ │ │ └── interactions.py
│ │ ├── docking.py # Docking evaluations
│ │ └── evaluate.py # Generated conformer evaluation pipelines
│ ├── pharm_utils/
│ │ ├── pharmacophore.py
│ │ ├── pharm_vec.py
│ │ └── smarts_featues.fdef # Pharmacophore definitions
│ ├── score/ # Scoring related functions and constants
│ │ ├── constants.py
│ │ ├── electrostatic_scoring.py
│ │ ├── gaussian_overlap.py
│ │ └── pharmacophore_scoring.py
│ ├── alignment.py
│ ├── conformer_generation.py # Conformer generation with rdkit and xtb
│ ├── container.py # Molecule and MoleculePair classes
│ ├── extract_profiles.py # Functions to extract interaction profiles
│ ├── generate_point_cloud.py
│ ├── objective.py # Objective function used for REINVENT
│ └── visualize.py # Visualization tools
├── scripts/ # Scripts for running evaluations
├── examples/ # Jupyter notebook tutorials/examples
├── tests/
├── environment.yml # Environment
└── README.md
An example environment can be found at environment.yml
, however, this package should generally work where PyTorch, Open3D, RDKit, and xTB can be installed in an environment with Python >=3.8.
python>=3.8
numpy>1.2,<2.0
pytorch>=1.12
mkl==2024.0 (use conda)
open3d>=0.18
rdkit>=2023.03 (newest available is recommended)
xtb>=6.6 (use conda)
scipy>=1.10
pandas>=2.0
Make sure that mkl is not 2024.1 since there is a known issue that prevents importing torch.
meeko
vina==1.2.5
You can pip install the python bindings for Autodock Vina for the python interface. However, this also requires an installation of the executable of Autodock Vina v1.2.5: https://vina.scripps.edu/downloads/ and the ADFR software suite: https://ccsb.scripps.edu/adfr/implementation/.
jax==0.4.26
jaxlib==0.4.26+cuda12.cudnn89
optax==0.2.2
py3dmol>=2.1.0
biopython>=1.84
prolif>=2.0.3
mdanalysis>=2.2.0
scikit-learn>=1.3
pip install -e .
for developer installThe package has base functions and convenience wrappers. Scoring can be done with either NumPy or Torch, but alignment requires Torch. There are also Jax implementations for both scoring and alignment of gaussian overlap and ESP similarity, but currently not for pharmacophores.
Useful conformer generation functions are found in the shepherd_score.conformer_generation
module.
Interaction profile | Function |
---|---|
shape | shepherd_score.extract_profiles.get_molecular_surface() |
electrostatics | shepherd_score.extract_profiles.get_electrostatic_potential() |
pharmacophores | shepherd_score.extract_profiles.get_pharmacophores() |
shepherd_score.score
contains the base scoring functions with seperate modules for those dependent on PyTorch (*.py
), NumPy (*_np.py
), and Jax (*_jax.py
).
Similarity | Function |
---|---|
shape | shepherd_score.score.gaussian_overlap.get_overlap() |
electrostatics | shepherd_score.score.electrostatic_scoring.get_overlap_esp() |
pharmacophores | shepherd_score.score.pharmacophore_scoring.get_overlap_pharm() |
Molecule
class
shepherd_score.container.Molecule
accepts an RDKit Mol
object (with an associated conformer) and generates user-specified interaction profilesMoleculePair
class
shepherd_score.container.MoleculePair
operates on Molecule
objects and prepares them for scoring and alignmentFull jupyter notebook tutorials/examples for extraction, scoring, and alignments are found in the examples
folder. Some minimal examples are below.
Extraction of interaction profiles.
from shepherd_score.conformer_generation import embed_conformer_from_smiles
from shepherd_score.conformer_generation import charges_from_single_point_conformer_with_xtb
from shepherd_score.extract_profiles import get_atomic_vdw_radii, get_molecular_surface
from shepherd_score.extract_profiles import get_pharmacophores, get_electrostatic_potential
from shepherd_score.extract_profiles import get_electrostatic_potential
# Embed conformer with RDKit and partial charges from xTB
ref_mol = embed_conformer_from_smiles('Oc1ccc(CC=C)cc1', MMFF_optimize=True)
partial_charges = charges_from_single_point_conformer_with_xtb(ref_mol)
# Radii are needed for surface extraction
radii = get_atomic_vdw_radii(ref_mol)
# `surface` is an np.array with shape (200, 3)
surface = get_molecular_surface(ref_mol.GetConformer().GetPositions(), radii, num_points=200)
# Get electrostatic potential at each point on the surface
# `esp`: np.array (200,)
esp = get_electrostatic_potential(ref_mol, partial_charges, surface)
# Pharmacophores as arrays with averaged vectors
# pharm_types: np.array (P,)
# pharm_{pos/vecs}: np.array (P,3)
pharm_types, pharm_pos, pharm_vecs = get_pharmacophores(ref_mol, multi_vector=False)
An example of scoring the similarity of two different molecules using 3D surface, ESP, and pharmacophore similarity metrics.
from shepherd_score.score.constants import ALPHA
from shepherd_score.conformer_generation import embed_conformer_from_smiles
from shepherd_score.conformer_generation import optimize_conformer_with_xtb
from shepherd_score.container import Molecule, MoleculePair
# Embed a random conformer with RDKit
ref_mol_rdkit = embed_conformer_from_smiles('Oc1ccc(CC=C)cc1', MMFF_optimize=True)
fit_mol_rdkit = embed_conformer_from_smiles('O=CCc1ccccc1', MMFF_optimize=True)
# Local relaxation with xTB
ref_mol, _, ref_charges = optimize_conformer_with_xtb(ref_mol_rdkit)
fit_mol, _, fit_charges = optimize_conformer_with_xtb(fit_mol_rdkit)
# Extract interaction profiles
ref_molec = Molecule(ref_mol,
num_surf_points=200,
partial_charges=ref_charges,
pharm_multi_vector=False)
fit_molec = Molecule(fit_mol,
num_surf_points=200,
partial_charges=fit_charges,
pharm_multi_vector=False)
# Centers the two molecules' COM's to the origin
mp = MoleculePair(ref_molec, fit_molec, num_surf_points=200, do_center=True)
# Compute the similarity score for each interaction profile
shape_score = mp.score_with_surf(ALPHA(mp.num_surf_points))
esp_score = mp.score_with_esp(ALPHA(mp.num_surf_points), lam=0.3)
pharm_score = mp.score_with_pharm()
Next we show alignment using the same MoleculePair class.
# Centers the two molecules' COM's to the origin
mp = MoleculePair(ref_molec, fit_molec, num_surf_points=200, do_center=True)
# Align fit_molec to ref_molec with your preferred objective function
# By default we use automatic differentiation via pytorch
surf_points_aligned = mp.align_with_surf(ALPHA(mp.num_surf_points),
num_repeats=50)
surf_points_esp_aligned = mp.align_with_esp(ALPHA(mp.num_surf_points),
lam=0.3,
num_repeats=50)
pharm_pos_aligned, pharm_vec_aligned = mp.align_with_pharm(num_repeats=50)
# Optimal scores and SE(3) transformation matrices are stored as attributes
mp.sim_aligned_{surf/esp/pharm}
mp.transform_{surf/esp/pharm}
# Get a copy of the optimally aligned fit Molecule object
transformed_fit_molec = mp.get_transformed_molecule(
se3_transform=mp.transform_{surf/esp/pharm}
)
We implement three evaluations of generated 3D conformers. Evaluations can be done on an individual basis or in a pipeline. Here we show the most basic use case in the unconditional setting.
ConfEval
ConsistencyEval
ConfEval
and evaluates the consistency of the molecule's jointly generated interaction profiles with the true interaction profiles using 3D similarity scoring functionsConditionalEval
ConfEval
and evaluates the 3D similarity between generated molecules and the target moleculeNote: Evaluations can be run from any molecule's atomic numbers and positions with explicit hydrogens (i.e., straight from an xyz file).
Full jupyter notebook tutorials/examples for evaluations are found in the examples
folder. Some minimal examples are below.
from shepherd_score.evaluations.evalutate import ConfEval
from shepherd_score.evaluations.evalutate import UnconditionalEvalPipeline
# ConfEval evaluates the validity of a given molecule, optimizes it with xTB,
# and also computes various 2D graph properties
# `atom_array` np.ndarray (N,) atomic numbers of the molecule (with explicit H)
# `position_array` np.ndarray (N,3) atom coordinates for the molecule
conf_eval = ConfEval(atoms=atom_array, positions=position_array)
# Alternatively, if you have a list of molecules you want to test:
uncond_pipe = UnconditionalEvalPipeline(
generated_mols = [(a, p) for a, p in zip(atom_arrays, position_arrays)]
)
uncond_pipe.evaluate()
# Properties are stored as attributes and can be converted into pandas df's
sample_df, global_series = uncond_pipe.to_pandas()
Scripts to evaluate ShEPhERD-generated samples can be found in the scripts
directory.
We provide the data used for model training, benchmarking, and all ShEPhERD-generated samples reported in the paper at this Dropbox link. There are comprehensive READMEs in the Dropbox describing the different folders.
This project is licensed under the MIT License -- see LICENSE file for details.
If you use or adapt shepherd_score
or ShEPhERD in your work, please cite us:
@misc{adamsShEPhERD2024,
title = {{{ShEPhERD}}: {{Diffusing}} Shape, Electrostatics, and Pharmacophores for Bioisosteric Drug Design},
author = {Adams, Keir and Abeywardane, Kento and Fromer, Jenna and Coley, Connor W.},
year = {2024},
number = {arXiv:2411.04130},
eprint = {2411.04130},
publisher = {arXiv},
doi = {10.48550/arXiv.2411.04130},
archiveprefix = {arXiv}
}