TomkUCL / SARS-CoV-2-Helicase-nsp13-Public-Antivirals-Virtual-Screening-Project

A repository for public suggestions towards SARS-COV-2 helicase antivirals using publicly-available software.
Apache License 2.0
0 stars 0 forks source link

Filtering docked poses to improve virtual-screening hit rates using LigGrep. #5

Open TomkUCL opened 7 months ago

TomkUCL commented 7 months ago

Aim:

This issue addresses how to apply LigGrep to filter through docked ligand poses to check that they retain key interactions with the protein that are observed in crystallographic fragment-protein structures. This will hopefully help to prioritise compounds for molecular dynamics simulations, free binding affinity calculations and/or chemical synthesis.

Background:

LigGrep 1.0.0 is a free, open-source tool developed by the Durrant lab that accepts a protein receptor file (PDB, PDBQT), a directory containing many docked-compound files (PDB, PDBQT, SDF), and a list of user-specified filters (JSON). It evaluates each docked pose and outputs the names of the compounds with poses that pass all filters.

For further details about LigGrep, please see the original publication: https://doi.org/10.1186/s13321-020-00471-2 and the GitHub page https://github.com/durrantlab/liggrep

TomkUCL commented 7 months ago

Installation:

For this case study, I am using HP laptop with Windows 10 pre-installed, however, I am using Ubuntu Linux as my command line interface.

1. Make a new LigGrep project folder

cd /mnt > cd d > mkdir liggrep_project

2. Install the Python virtual environment package:

sudo apt install python 3.10-env

3. Within your 'liggrep_project' folder create a new Python 3 virtual environment called 'project_env_1' in which to run the liggrep programme

python3 -m venv project_env_1

4. Activate the new Python 3 project environment

source project_env_1/bin/activate

5. Install the required packages within this project python environment, including LigGrep and the necessary third-party Python libraries RDKit, NumPy, and SciPy from their respective repositories

6. Deactivate the python environment

deactivate

Now, next time you want to use LigGrep, you simply need to go to your project_env_1 folder, activate the Python environment, open the 'liggrep' folder, and enter the relevant command line arguments:

cd /mnt/d/liggrep_project/project_env_1

source bin/activate

Now go to the folder containing the liggrep.py python file:

cd liggrep

Ensure that you are in the directory containing the 'liggrep.py' file:

(Project_env) (base) tom@DESKTOP-LG9R7AE: /mnt/d/liggrep-project/project_env_1/liggrep$

Once we have activated the python environment and we are within the folder containing liggrep.py file, we can now run liggrep by specifying all of the arguments. These can be broken down as follows:

python3 run python3 liggrep.py open python file liggrep.py d/5rmm_rigid_vs/5rmm.rigid.pdbqt receptor file d/5rmm_rigid_vs/vs_results_pdbqt/*.pdbqt ligand files (* = all pdbqt files in the folder) d/5rmm_rigid_vs/vs_results_pdbqt/5rmm_filters.json JSON file defining filters to be applied by LigGrep -m SMILES SMILES mode -f liggrep-project/project_env_1/liggrep_analysis.txt output file --num_processors 1 number of processors --job_manager multiprocessing --verbose explains why each ligand passes of fails the JSON filter

So the full Ubuntu command line script will be as follows:

python3 liggrep.py d/5rmm_rigid_vs/5rmm.rigid.pdbqt d/5rmm_rigid_vs/vs_results_pdbqt/*.pdbqt d/5rmm_rigid_vs/vs_results_pdbqt/5rmm_filters.json -m SMILES -f liggrep-project/project_env_1/liggrep_analysis.txt num_processors 1 --job_manager multiprocessing --verbose

TomkUCL commented 7 months ago

Running liggrep 1.0.0 on your ligands

LigGrep requires as input:

  1. a PDBQT or PDB file of the drug-target receptor used for docking
  2. a directory of PDBQT, PDB, or SDF files containing the docked poses of candidate ligands, and
  3. a JSON-formatted file describing user-specified filters.

LigGrep’s first command-line argument is the path to the PDB/PDBQT-formatted receptor file used for docking.

LigGrep’s second command-line argument is the path to a directory containing the docked-compound .pdbqt files, as well as the mode (NONE, SMILES or OPENBABEL) that we would like to run.

LigGrep’s third command-line argument is the path to a JSON file contains a list of filters that the input compounds must satisfy. LigGrep filters have four user-defined components: 1) a ligand-substructure specification describing one or more bonded atoms, 2) a point in 3D space (the query point), 3) a distance cut-off, and 4) an optional “exclude” fag.

TomkUCL commented 7 months ago

Example using PDB 5RMM:

Based on these crystal structures, we are interested in checking for an interaction of the carboxylate carbonyl O atom and the NH H atom of residue SER486 in chain B.

image

image

image

![image](https://github.com/TomkUCL/SARS-CoV-2-Helicase-nsp13-Public-Antivirals-Virtual-Screening-Project/assets/92033163/b82581b9-e65f-4f97-b666-748cb714deff

First, open the .pdbqt receptor file in Discovery Studio or PyMol and find the residue atom in the hierarchy table:

image

image

To determine whether a given docked pose satisfies the user-specified filters list, LigGrep first uses the RDKit Python library to check whether the molecule contains the necessary ligand substructures (i.e., the substructures associated with all filters that do not have “exclude” flags.

LigGrep rejects all molecules that do not contain each of the necessary substructures. Users specify substructures via SMILES arbitrary target specification (SMARTS) notation, which is syntactically similar to SMILES. First, extract the SMILES string for your desired substructure:

image

[O-]C([C@@H]1CNC[C@H]1c2ccccc2)=O

image

[O]C([C@@H]1C[N]C[C@H]1c2ccccc2)=O

image

TomkUCL commented 7 months ago

What is a JSON file and why do I need one for LigGrep?

Here is my JSON file for this virtual screen, which specifies to identify whether an oxygen atom [#8] is located within 3.0 Angstroms of receptor atom HD22:

[
    {
        "receptorAtom": {
            "chain": "B",
            "resid": 516,
            "atomname": "HD22"
        },
        "ligandSubstructSMARTS": "[#8]",
        "distance": 3.0
    }
]

After running this filter for the control ligand (VGX), which is based on the crystal structure pdb ID 5RMM shown below, LigGrep confirms that an oxygen is situated within 3.o angstroms of the amide hydrogen atom (HD22) of residue ASN516. Now we can run this filter on our docked ligand .pdbqt library of ~9000 poses.

image

image