Open TomkUCL opened 7 months ago
For this case study, I am using HP laptop with Windows 10 pre-installed, however, I am using Ubuntu Linux as my command line interface.
cd /mnt > cd d > mkdir liggrep_project
sudo apt install python 3.10-env
python3 -m venv project_env_1
source project_env_1/bin/activate
deactivate
Now, next time you want to use LigGrep, you simply need to go to your project_env_1 folder, activate the Python environment, open the 'liggrep' folder, and enter the relevant command line arguments:
cd /mnt/d/liggrep_project/project_env_1
source bin/activate
Now go to the folder containing the liggrep.py
python file:
cd liggrep
Ensure that you are in the directory containing the 'liggrep.py' file:
(Project_env) (base) tom@DESKTOP-LG9R7AE: /mnt/d/liggrep-project/project_env_1/liggrep$
Once we have activated the python environment and we are within the folder containing liggrep.py file, we can now run liggrep by specifying all of the arguments. These can be broken down as follows:
python3
run python3
liggrep.py
open python file liggrep.py
d/5rmm_rigid_vs/5rmm.rigid.pdbqt
receptor file
d/5rmm_rigid_vs/vs_results_pdbqt/*.pdbqt
ligand files (* = all pdbqt files in the folder)
d/5rmm_rigid_vs/vs_results_pdbqt/5rmm_filters.json
JSON file defining filters to be applied by LigGrep
-m SMILES
SMILES mode
-f liggrep-project/project_env_1/liggrep_analysis.txt
output file
--num_processors 1
number of processors
--job_manager multiprocessing
--verbose
explains why each ligand passes of fails the JSON filter
So the full Ubuntu command line script will be as follows:
python3 liggrep.py d/5rmm_rigid_vs/5rmm.rigid.pdbqt d/5rmm_rigid_vs/vs_results_pdbqt/*.pdbqt d/5rmm_rigid_vs/vs_results_pdbqt/5rmm_filters.json -m SMILES -f liggrep-project/project_env_1/liggrep_analysis.txt num_processors 1 --job_manager multiprocessing --verbose
LigGrep requires as input:
LigGrep’s first command-line argument is the path to the PDB/PDBQT-formatted receptor file used for docking.
LigGrep’s second command-line argument is the path to a directory containing the docked-compound .pdbqt files, as well as the mode (NONE, SMILES or OPENBABEL) that we would like to run.
LigGrep’s third command-line argument is the path to a JSON file contains a list of filters that the input compounds must satisfy. LigGrep filters have four user-defined components: 1) a ligand-substructure specification describing one or more bonded atoms, 2) a point in 3D space (the query point), 3) a distance cut-off, and 4) an optional “exclude” fag.
Based on these crystal structures, we are interested in checking for an interaction of the carboxylate carbonyl O atom and the NH H atom of residue SER486 in chain B.
First, open the .pdbqt receptor file in Discovery Studio or PyMol and find the residue atom in the hierarchy table:
To determine whether a given docked pose satisfies the user-specified filters list, LigGrep first uses the RDKit Python library to check whether the molecule contains the necessary ligand substructures (i.e., the substructures associated with all filters that do not have “exclude” flags.
LigGrep rejects all molecules that do not contain each of the necessary substructures. Users specify substructures via SMILES arbitrary target specification (SMARTS) notation, which is syntactically similar to SMILES. First, extract the SMILES string for your desired substructure:
[O-]C([C@@H]1CNC[C@H]1c2ccccc2)=O
[O]C([C@@H]1C[N]C[C@H]1c2ccccc2)=O
JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate.
The JSON starts with a square bracket [, indicating that it is an array.
Inside the array, there are several objects (denoted by curly braces {}), each representing a specific data entry.
Each object contains key-value pairs:
For the first four objects, they include a "receptorAtom" object, which itself contains "chain", "resid", and "atomname" keys, representing properties of a receptor atom.
Each object also has a "ligandSubstructSMARTS" key, which represents structural information about the ligand atoms we are interested in.
Additionally, there's a "distance" key representing a numerical distance value.
The fifth object is slightly different, containing a "coordinate" key with an array value, and an "exclude" key with a boolean value (true).
Overall, this JSON structure represents molecular or chemical data related to molecular docking or structural analysis.
Here is my JSON file for this virtual screen, which specifies to identify whether an oxygen atom [#8] is located within 3.0 Angstroms of receptor atom HD22:
[
{
"receptorAtom": {
"chain": "B",
"resid": 516,
"atomname": "HD22"
},
"ligandSubstructSMARTS": "[#8]",
"distance": 3.0
}
]
After running this filter for the control ligand (VGX), which is based on the crystal structure pdb ID 5RMM shown below, LigGrep confirms that an oxygen is situated within 3.o angstroms of the amide hydrogen atom (HD22) of residue ASN516. Now we can run this filter on our docked ligand .pdbqt library of ~9000 poses.
Aim:
This issue addresses how to apply LigGrep to filter through docked ligand poses to check that they retain key interactions with the protein that are observed in crystallographic fragment-protein structures. This will hopefully help to prioritise compounds for molecular dynamics simulations, free binding affinity calculations and/or chemical synthesis.
Background:
LigGrep 1.0.0 is a free, open-source tool developed by the Durrant lab that accepts a protein receptor file (PDB, PDBQT), a directory containing many docked-compound files (PDB, PDBQT, SDF), and a list of user-specified filters (JSON). It evaluates each docked pose and outputs the names of the compounds with poses that pass all filters.
For further details about LigGrep, please see the original publication: https://doi.org/10.1186/s13321-020-00471-2 and the GitHub page https://github.com/durrantlab/liggrep