choderalab / fah-xchem

Tools and infrastructure for automated compound discovery using Folding@home
MIT License
6 stars 3 forks source link

fah-xchem

GitHub Actions Build Status codecov

Tools and infrastructure for automated compound discovery using Folding@home.

Installation

  1. Clone the repository and cd into repo root:

    git clone https://github.com/choderalab/fah-xchem.git
    cd fah-xchem
  2. Create a conda environment with the required dependencies:

    conda env create -f environment.yml

    If the above process is slow, we recommend using mamba to speed up installation:

    mamba env create -f environment.yml
  3. Install fah-xchem in the environment using pip:

    pip install .

Example usage

Download molecule and experimental data from CDD and generate an experimental data file for analysis use:

export CDD_VAULT_NUM=<vault-num>
export CDD_VAULT_TOKEN=<vault-token>

FLUORESCENCE_IC50_PROTOCOL_ID=49439

# will take some time; pulls full data export from CDD
fah-xchem -l INFO cdd --data-dir cdd-data/ retrieve-protocol-data --molecules -i $FLUORESCENCE_IC50_PROTOCOL_ID

# next step REQUIRES OpenEye license
export OE_LICENSE=/path/to/oe_license.txt

# merges and transforms data elements pulled from CDD into usable form for downstream analysis
fah-xchem -l INFO cdd --data-dir cdd-data/ generate-experimental-compound-data -i 49439 experimental_compound_data.json

Run transformation and compound free energy analysis, producing results/analysis.json:

fah-xchem --loglevel INFO \
        compound-series analyze \
        --experimental-data-file experimental_compound_data.json \
        --config-file config.json \
        --fah-projects-dir /path/to/projects/ \
        --fah-data-dir /path/to/data/SVR314342810/ \
        --loglevel INFO \
        --nprocs 8
        compound-series.json \
        /path/to/output-dir/analysis.json

Generate representative snapshots, plots, PDF report, and static site HTML in output directory:

fah-xchem --loglevel INFO \
        artifacts generate \
        --config-file config.json \
        --fragalysis-config fragalysis_config.json \
        --fah-projects-dir /path/to/projects/ \
        --fah-data-dir /path/to/data/SVR314342810/ \
        --website-base-url https://my-bucket.s3.amazonaws.com/site/prefix/ \
        --cache-dir results/cache/ \
        --nprocs 8 \
        /path/to/output-dir/analysis.json \
        /path/to/output-dir/

Unit conventions

Energies are represented in configuration and internally in units of k T, except when otherwise indicated. For energies in kilocalories per mole, the function or variable name should be suffixed with _kcal.

Configuration

Compound series

The compound series is specified as JSON with schema given by the CompoundSeriesAnalysis model (see fah_xchem.schema.

Analysis configuration

Some analysis options can be configured in a separate JSON file with schema given by the AnalysisConfig model. For example,

config.json

{
    "min_num_work_values": 10,
    "max_binding_free_energy": 0
}

The JSON file is passed on the command line using the --config-file option.

Upload to Fragalysis

To upload sprint results to Fragalysis a JSON config file may be supplied. For example,

fragalysis_config.json

{
        "run": true,
        "ligands_filename": "reliable-transformations-final-ligands.sdf",
        "fragalysis_sdf_filename": "compound-set_foldingathome-sprint-X.sdf",
        "ref_url": "https://url-link",
        "ref_mols": "x00000",
        "ref_pdb": "references.zip",
        "target_name": "protein-target",
        "submitter_name": "Folding@home",
        "submitter_email": "first.last@email.org",
        "submitter_institution": "institution-name",
        "method": "Sprint X",
        "upload_key": "upload-key",
        "new_upload": true 
}

The JSON file is passed on the command line using the --fragalysis-config option.

Description of the JSON parameters:

For more information on the upload format see this forum post.

A unique upload_key is needed to push to Fragalysis, this can be requested here.

For more information on the entire upload process see this forum post.

Server-specific configuration

Paths to Folding@home project and data directories are passed on the command line. See usage examples above.

Development setup

Conda

This project uses conda to manage the environment. To set up a conda environment named fah-xchem with the required dependencies, create the conda environment as described above. To install fah-xchem as dev run:

pip install -e .

Running tests locally

pytest

Formatting

Code formatting with black is enforced via a CI check. To install black with conda, use

conda install black

Building documentation

cd docs
make html

Copyright

Copyright (c) 2020, Chodera Lab

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.3.