To learn how to evaluate your model, see Getting Started notebook.
Paper: https://arxiv.org/abs/2006.16955.
News:
Listed below are benchmark results from the paper for docking score optimization (the lower, the better). Each cell reports the mean score for the generated compounds and their internal diversity in parenthesis. For each protein we sampled a set of molecules from ZINC subset of protein's training set size. As a baseline, we also report results for the top 10% molecules from the training set and ZINC. Please see our paper for more details.
5HT1B | 5HT2B | ACM2 | CYP2D6 | |
---|---|---|---|---|
CVAE | -4.647 (0.907) | -4.188 (0.913) | -4.836 (0.905) | - |
GVAE | -4.955 (0.901) | -4.641 (0.887) | -5.422 (0.898) | -7.672 (0.714) |
REINVENT | -9.774 (0.506) | -8.657 (0.455) | -9.775 (0.467) | -8.759 (0.626) |
Train (10%) | -10.837 (0.749) | -9.769 (0.831) | -8.976 (0.812) | -9.256 (0.869) |
ZINC (10%) | -9.894 (0.862) | -9.228 (0.851) | -8.282 (0.860) | -8.787 (0.853) |
The best way is to use conda environment.
Create new environment and run docking_benchmark/install_conda_env.sh
script.
In order to run experiments or train models additional data is required.
Download this zip, unpack it and set the DOCKING_BENCHMARK_DATA
environment variable to this directory.
Run the docking_baselines/scripts/generate_molecules.py
script. Run it with -h
flag for info about arguments.
Details about some of the arguments:
protein
- protein that ligand will be docked to; possible choices: 5ht1b
, 5ht2b
, acm2
random_samples
- Gauss samples that will be docked (see paper)mode
- minimize
or maximize
- whether the model should minimize or maximize the componentdataset
- dataset used to fine-tune the model;
available datasets for a given protein are listed in DOCKING_BENCHMARK_DATA/proteins_data/protein/metadata.json
file;
the dataset defines the component to be optimized by properly setting score_column
in metadata
file