czodrowskilab / VSFlow

MIT License
80 stars 19 forks source link

VSFlow - Virtual Screening Workflow

VSFlow is an open-source command-line tool built on top of the RDKit [1] for the ligand-based virtual screening of large compound libraries (databases). It includes a substructure-based, a fingerprint-based and a shape-based virtual screening tool. Additionally, it provides a tool to prepare databases for screening (molecule standardization, fingerprint and conformer generation). Screenings can be parallelized with Python's built-in multiprocessing package. Additionally, VSFlow accepts a wide range of input file formats. The screening results can be exported in various file formats, including Excel files. As additional feature, VSFlow supports the visualization of the screening results as PDF file and/or PyMOL file [2], allowing for a quick inspection of the results by the user. VSFlow is fully written in Python.

Installation

The classic way

First of all, you need a working installation of Anaconda (https://www.anaconda.com/products/individual) or Miniconda (https://conda.io/en/latest/miniconda.html). Both are available for all major platforms.

Second, you need to clone the VSFlow GitHub repository to your system or download the zip file and unpack it (in the following called the repository folder).

All following instructions assume working with a bash shell!

Navigate into the repository folder.

Now, you can install the required dependencies with the provided environment.yml file within the repository folder as follows:

conda env create --quiet --force --file environment.yml
conda activate vsflow

Alternatively, you can also create a new conda environment and install the dependencies manually:

conda create -n vsflow python=3.9
conda activate vsflow
conda install -c conda-forge rdkit xlrd xlsxwriter pdfrw fpdf pymol-open-source molvs matplotlib 

The Python dependencies are:

Now, you can install VSFlow as follows:

pip install .

Using Docker (Linux only)

First, you need to clone the VSFlow GitHub repository to your system or download the zip file and unpack it (in the following called the repository folder). Assuming you have Docker installed and switched to the repository folder, you can build the Docker image as follows:

docker build --tag vsflow .

The build process might take a while and only needs to be done once. After it is finished, you can run VSFlow as follows:

docker run --rm -it -v $(pwd):/data vsflow
cd /data
vsflow --help

Instead of using $(pwd) you can also use the absolute path to your desired working directory. Every file in /data in the container, e.g. generated databases or output files by VSFlow, is mirrored to your working directory on your host system.

If you are finished, you can exit the container by typing exit or pressing Ctrl+D.

General Usage

Always make sure the conda environment is activated. Now you can run VSFlow as follows:

vsflow {mode} {arguments}

For example, the following command will display all included modes (substructure, fpsim, shape, preparedb, managedb) and the general usage:

vsflow -h

To display all possible arguments for a particular mode, type as follows:

vsflow {mode} -h

For example, with the following command all arguments for mode substructure are shown:

vsflow substructure -h

Example Usage

A detailed usage of VSFlow with many examples is provided in the GitHub Wiki:
https://github.com/czodrowskilab/VSFlow/wiki

References

[1] RDKit, Open-Source cheminformatics; http://www.rdkit.org.
[2] The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.