This repository contains the source of pyscreener, both a library and software for conducting HTVS via python calls
If docker is not installed already for your system then it can be installed from the official docker website.
The provided Dockerfile
can be used to create pyscreener instances containing the required docking software and python dependencies / code. Any of the four vina docking softwares - vina
, qvina2
, smina
, and psovina
- can be specified for installation to the docker image.
All python dependencies and the pyscreener library are installed to a conda environment named pyscreener
which must be activated once the docker image starts.
The below commands can be run in the directory containing the Dockerfile
and environment.yml
files to build the desired image:
docker build -t pyscreener:base --target base .
: Creates a docker image containing all python dependencies and pyscreener library but no docking software docker build -t pyscreener:vina --target vina .
: Creates an image from pyscreener:base
with vina
installed docker build -t pyscreener:qvina --target qvina .
: Creates an image from pyscreener:base
with qvina
installed docker build -t pyscreener:smina --target smina .
: Creates an image from pyscreener:base
with smina
installed docker build -t pyscreener:psovina --target psovina .
: Creates an image from pyscreener:base
with psovina
installed As DOCK6
software requires a license, it is not possible to include its installation within the associated docker image.
A compiled form of sphgen_cpp
and the binary required for installation of chimera
are both available within the dock6_utils
directory of the associated dock6 image:
docker build -t pyscreener:dock6 --target base-dock6 .
: Creates an image from pyscreener:base
containing utility software needed for DOCK6
to run once installed Notes :
base
will be activated by default. This contains all the required python dependencies so there is no need to manually activate an environment once inside the containernumpy
, openbabel
, openmm
, pdbfixer
, ray
, rdkit
, scikit-learn
, scipy
, and tqdm
conda env create -f environment.yml
conda activate pyscreener
pip install pyscreener
(or if installing from source, pip install .
)Before running pyscreener
, be sure to first activate the environment: conda activate pyscreener
(or whatever you've named your environment)
vina-type software
install ADFR Suite and add prepare_receptor
to your PATH. If this step was successful, the command which prepare_receptor
should output path/to/prepare_receptor
. This can be done via either:
adding the entire bin
directory to your path (you should see a command at the end of the installation process) or
adding only prepare_receptor
in the bin
directory to your PATH as detailed below
install any of the following docking software: vina 1.1.2 (note: pyscreener does not work with vina 1.2), qvina2, smina, psovina and ensure the desired software executable is in a folder that is located on your path
dock6
. It is the folder that contains the bin
, install
, etc. subdirectories.)wget http://dock.compbio.ucsf.edu/Contributed_Code/code/sphgen_cpp.1.2.tar.gz
tar -xzvf sphgen_cpp.1.2.tar.gz
cd sphgen_cpp.1.2
make
sphgen_cpp
) inside the bin
subdirectory of the DOCK6 parent directory. If you've configured the environment variable already, (on linux) you can run: mv sphgen_cpp $DOCK6/bin
To add an executable to your PATH, you have three options:
ln -s FILE -t DIR
. Typically, ~/bin
or ~/.local/bin
are good target directories (i.e., DIR
). To see what directories are currently on your path, type echo $PATH
. There will typically be a lot of directories on your path, and it is best to avoid creating files in any directory above your home directory ($HOME
on most *nix-based systems)cp FILE DIR
export PATH=$PATH:DIR
, where DIR
is the directory containing the file in question. As your PATH must be configured each time run pyscreener, this command should also be placed inside your ~/.bashrc
or ~/.bash_profile
(if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.To set the DOCK6
environment variable, run the following command: export DOCK6=path/to/dock6
, where path/to/dock6
is the full path of the DOCK6 parent directory mentioned above. As this this environment variable must always be set before running pyscreener, the command should be placed inside your ~/.bashrc
or ~/.bash_profile
(if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.
pyscreener uses ray
as its parallel backend. If you plan to parallelize the software only across your local machine, don't need to do anything . However, if you wish to either (a.) limit the number of cores pyscreener will be run over or (b.) run it over a distributed setup (e.g., an HPC with many distinct nodes), you must manually start a ray cluster before running pyscreener.
To do this, simply type ray start --head --num-cpus N
before starting pyscreener (where N
is the total number of cores you wish to allow pyscreener to utilize). Not performing this step will give pyscreener access to all of the cores on your local machine, potentially slowing down other applications.
While the precise instructions for this will vary with HPC cluster architecture, the general idea is to establish a ray cluster between the nodes allocated to your job. We have provided a sample SLURM submission script (run_pyscreener_distributed_example.batch) to achieve this, but you may have to alter some commands depending on your system. For more information on this see here. To allow pyscreener to connect to your ray cluster, you must set the ip_head
and redis_password
environment variables appropriately, where ip_head
is the address of the head of your ray cluster, i.e., IP:PORT
where IP
is the IP address of the head node and PORT
is the port that is running ray.
pyscreener writes a lot of intermediate input and output files (due to the inherent specifications of the underlying docking software.) Given that the primary endpoint of pyscreener is a list of ligands and associated scores (rather than the specific binding poses,) these files are written to each node's temporary directory (determined by tempfile.gettempdir()
) and discarded at the end. If you wish to collect these files, pass the --collect-all
flag in the program arguments or run the collect_files()
method of your VirtualScreen
object when your screen is complete.
Note: the VirtualScreen.collect_files()
method is slow due to the need to send possibly a bunch of files over the network. This method should only be run once over the lifetime of a VirtualScreen
object, as several intermediate calls will yield the same result as a single, final call.
Note: tempfile.gettempdir()
returns a path that depends the values of specific environment variables (see here). It is possible that the value returned on your system is not actually a valid path for you! In this case you will likely get file permissions errors and must ask your system administrator where this value should point to and set your environment variables accordingly before running pyscreener!
!!please read the entire section before running pyscreener!!
pyscreener was designed to have a minimal interface under the principal that a high-throughput virtual screen is intended to be a broad strokes technique to gauge ligand favorability. With that in mind, all one really needs to get going are the following:
screen-type
) you would like to run: vina
or dock
for Vina-type or DOCK6 screens, respectively2
and 8
depending on your compute setup. If you're docking molecule-by-molecule, e.g., reinforcement learning, then you will likely want this to be as many CPUs as are on your machine.There are a variety of other options you can specify as well (including how to score a ligand given that multiple scored conformations are output, how to score against an ensemble of structures, etc.) To see all of these options and what they do, use the following command: pyscreener --help
. All of these options may be specified on the command line or in a configuration file that accepts YAML, INI, and argparse
syntaxes. Example configuration files are located in integration-tests/configs.
To check if everything is working and installed properly, first run pyscreener like so: pyscreener --config path/to/your/config --smoke-test
Vina-type and DOCK6 docking simulations have a number of options unique to their preparation and simulation pipeline, and these options are termed simulation "metadata" in pyscreener
. At present, only a few of these options are supported for both families of docking software, but future updates will add support for more of these options. These options may be specified via a JSON struct to the --metadata-template
argument. Below is a list of the supported options for both types of docking screen (default options provided in parentheses next to the parameter)
Vina-type
software
(="vina"
): which Vina-type docking software you would like to use. Currently supported values: "vina"
, "qvina",
"smina"
, and "psovina"
extra
(=""
): all the extra command line options to pass to a Vina-type docking software. E.g. for a run of Smina, extra="--force_cap ARG"
or for PSOVina, extra="-w ARG"
DOCK6
probe_radius
(=1.4
): the size of the probe to use for calculating the molecular surface (see here for more details)steric_clash_dist
(=0.0
): prevent the generation of large spheres with close surface contacts with larger valuesmin_radius
(=1.4
): the minimum radius of sphere to use for sphere generationmax_radius
(=4.0
): the maximum "..."sphere_mode
(="box"
): the method by which to select spheres for docking box construction. Accepted values: "largest"
, select the largest cluster of spheres; "box"
, select all spheres within a predefined docking box; "ligand"
, use the coordinates of a previously docked/bound ligand to select spheresdocked_ligand_file
(=""
): a MOL2 file containing the coordinates of a previously docked/bound ligandbuffer
(=10.0
): the amount of extra space (in Angstroms) to be added around the ligand when selecting spheresenclose_spheres
(=True
): whether to construct the docking box by enclosing all of the selected spheres or use only spheres within a predefined docking boxTo test whether your environment is setup correctly with respect to pathing and environment variables, run pyscreener
like so:
pyscreener --smoke-test --screen-type SCREEN_TYPE --metadata-template TEMPLATE
where SCREEN_TYPE
and METADATA_TEMPLATE
and values as described above
If the checks pass, then your environment is set up correctly.
To check if pyscreener
is set up properly, you can run the following:
>>> import pyscreener as ps
>>> software = "..."
>>> metadata = {...}
>>> ps.check_env(software, metadata)
...
where software is the name of the software you intend to use and metadata is a dictionary containing the metadata template. Please see the metadata templates section for details on possible key-value pairs.
The object model of pyscreener relies on four classes:
CalculationData
: a simple object containing the broadstrokes specifications of a docking calculation common to all types of docking calculations (e.g., Vina, DOCK6, etc.): the SMILES string, the target receptor, the center/size of a docking box, the metadata, and the result.CalculationMetadata
: a nondescript object that contains software-specific fields. For example, a Vina-type calculation requires a software
parameter, whereas a DOCK6 calculation requires a number of different parameters for receptor preparation. Most importantly, the metadata will always contain two fields of abstract type: prepared_ligand
and prepared_receptor
.DockingRunner
: a static object that takes defines an interface to prepare and run docking calculations. Each calculation type defines its own DockingRunner
implementation.DockingVirtualScreen
: an object that organizes a virtual screen. At a high level, a virtual is a series of docking calculations with some template set of parameters performed for a collection of molecules and distributed over some set of computational resources. A DockingVirtualScreen
takes as arguments a DockingRunner
, a list of receptors (for possible ensemble docking) and a set of template values for a CalculationData
template. It defines a __call__()
method that takes an unzipped list of SMILES strings, builds the CalculationData
objects for each molecule, and submits these objects for preparation and calculation to various resources in the ray cluster (see ray setup).To perform docking calls inside your python code using pyscreener
, you must first initialize a DockingVirtualScreen
object either through the factory pyscreener.virtual_screen
function or manually initializing one. The following section will show an example of how to perform computational from inside a python interpreter.
the following code snippet will dock benzene (SMILES string "c1ccccc1"
) against the D4 dopamine receptor (PDB ID 5WIU
) using a predefined docking box and Autodock Vina
>>> import ray
>>> ray.init()
[...]
>>> import pyscreener as ps
>>> metadata = ps.build_metadata("vina")
>>> virtual_screen = ps.virtual_screen("vina", ["integration-tests/inputs/5WIU.pdb"], (-18.2, 14.4, -16.1), (15.4, 13.9, 14.5), metadata, ncpu=8)
{...}
>>> scores = virtual_screen("c1ccccc1")
>>> scores
array([-4.4])
A few notes from the above example:
pdbids=["5WIU"]
) but you must know the coordinates of the docking box for the corresponding PDB file. This usually means downloading the PDB file and manually inspecting it for more reliable results, but it's there if you want it.vs = ps.virtual_screen("vina", ["integration-tests/inputs/5WIU.pdb"], None, None, metadata, ncpu=8, docked_ligand_file="path/to/DOCKED_LIGAND.pdb")
ray.init()
like we did above. This was only done to highlight the ability to initialize ray according to your own needs (i.e., a distributed setup).LigandSupply
class and access the .ligands
attribute, e.g.,
supply = ps.LigandSupply(['integration-tests/inputs/ligands.csv'])
virtual_screen(supply.ligands)
for more examples, check out the examples folder!
pip install pytest-cov
pytest
Copyright (c) 2021, david graff