Open TomkUCL opened 9 months ago
1) Install Ubuntu Desktop (Linux command line interface) onto your computer https://ubuntu.com/download
2) Install anaconda for Linux-x86 https://www.anaconda.com/download https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
3) Make a new folder (directory) in your Ubuntu terminal, e.g. cd /mnt > ls > cd d > mkdir gypsum_dl-1.2.1 >
(> = press enter button)
4) Enter the new directory you have just created; cd gypsum_dl-1.2.1 >
5) Save your ligands.sdf file into the current directory 'gypsum_dl-1.2.1'.
6) Install the necessary third-party libraries (rdkit, scipy, and numpy) needed to run Gypsum-DL:
conda install -c rdkit rdkit numpy scipy mpi4py
If rdkit does not install, try the install command from their own rdkit repository
$ conda create -c conda-forge -n my-rdkit-env rdkit
6) Install Gypsum-DL-1.2.1 from the GitHub repository - https://github.com/durrantlab/gypsum_dl
7) Create and activate a new conda environment within the current directory using the Linux command line in Ubuntu:
conda create -c conda-forge --name gypsum_dl_env rdkit numpy scipy mpi4py -y
conda activate gypsum_dl_env
8) Run gypsum_dl using your specified input file path:
python run_gypsum_dl.py --source ./examples/sample_molecules.smi
For example, in my folder D drive containing folder gypsum_dl 1.2.1 > subfolder sdf_input_files > subfolder 505_selection_of_comb_lib_1.sdf I would type the following into the Linux command line:
python run_gypsum_dl.py --d/gypsum_dl-1.2.1/sdf_input_files/505_selection_of_comb_lib_1.sdf"
In my case I have also specified the output file location as "3D output files".
(gypsum_dl_env) tom@DESKTOP-LG9R7AE:/mnt/d/gypsum_dl-1.2.1$ python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/sdf_input_files/50_selection_of_comb_lib_1.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files --separate_output_files
2. Select the SMILES strings for you compounds you wish to use by selecting the column in datawarrior (shift + left click). Then copy and paste the SMILES column into a new Excel spreadsheet.
3. Next, copy and paste the SMILES into a new Excel spreadsheet and save as a new .txt file.
4. Open the .txt file, then save as file type 'All files', then delete the .txt extension on the file name and instead save as a .smi file
5. Lastly, run Gypsum-DL using the following command prompt;
python run_gypsum_dl.py --source YOUR FILE LOCATION
So in my case, this would be...
python run_gypsum_dl.py --source d/gypsum_dl-1.2.1/sdf_input_files/50_selection_of_comb_lib_1.smi
..or if you wanted each model (e.g. tautomer) generated for each ligand to be stored as its own .sdf file in the output folder '3D_output_files' within the current directory, the command I would type would look like this:
_(gypsum_dl_env) tom@DESKTOP-LG9R7AE:/mnt/d/gypsumdl-1.2.1$ python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/sdf_input_files/50_selection_of_comb_lib_1.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files --separate_output_files
Unfortunately, this creates a slight problem. AutoDock Vina will only dock one model per .pdbqt file. So how do we create .pdbqt files for all of the tautomers/diastereomers we have now created? Don't worry we'll cover this now:
Once you have prepared your 3D ligands using Gypsum-DL, enter your folder containing your new combined 3D ligands .sdf file in the Ubuntu terminal (see right side below for example commands). Now you want to split the 50-ligand-containing .sdf file into individual ligand.sdf files. You can do this using the simple open babel command:
obabel -isdf YOURLIGANDFILE.sdf -osdf -O *.sdf --split
e.g.
obabel -isdf gypsum_dl_success.sdf -osdf -O *.sdf --split
awk '/^MODEL/{n++}{print > output_dir "output_prefix" n ".pdbqt"}' "input_dir/ligands.pdbqt" && csplit --suppress-matched "input_dir/ligands.pdbqt" '/^MODEL/' '{*}' && rm xx*
But remember to first replace "input_dir" with the path to your input directory and "output_dir" with the path to your output directory. Also, replace "output_prefix" with your desired prefix for the output files.
For example, my input directory is /mnt/d/gypsum_dl-1.2.1/3D_output_files/250_ligands.pdbqt and my output directory is /mnt/d/gypsum_dl-1.2.1/3D_output_files/, (i.e. the same directory) and I want the output files to be named 'model', the command would look like this:
awk '/^MODEL/{n++}{print > "model" n ".pdbqt"}' "/mnt/d/gypsum_dl-1.2.1/3D_output_files/250_ligands.pdbqt" && csplit --suppress-matched "/mnt/d/gypsum_dl-1.2.1/3D_output_files/250_ligands.pdbqt" '/^MODEL/' '{*}' && rm xx*
Lastly, we need to remove the 'MODEL' line in each .pdbqt file, otherwise Vina will read each file as containing multiple models and will be unable to dock these. We can do this using a python script:
import os
def delete_first_line(file_path):
# Read the content of the file
with open(file_path, 'r') as file:
lines = file.readlines()
# Remove the first line starting with 'MODEL'
lines = [line for line in lines if not line.startswith('MODEL')]
# Write the modified content back to the file
with open(file_path, 'w') as file:
file.writelines(lines)
def process_files_in_directory():
# Get the current directory
directory = os.getcwd()
# Iterate through each file in the directory
for filename in os.listdir(directory):
if filename.endswith('.pdbqt'):
file_path = os.path.join(directory, filename)
# Delete the first line starting with 'MODEL'
delete_first_line(file_path)
if __name__ == "__main__":
process_files_in_directory()
print("Processing complete.")
Save this script as _delete_model_linepdbqt.py in the directory containing your .pdbqt files. Then, open a terminal, navigate to the directory containing both the script and the .pdbqt files, and execute the script:
python3 delete_model_line_pdbqt.py
This will process each .pdbqt file in the current directory, deleting the first line starting with 'MODEL' from each file. After processing, it will print "Processing complete."
Now that we have our ligands as individual .pdbqt files, we can now move onto the docking process using AutoDock Vina in Issue #1.
(base) tom@DESKTOP-LG9R7AE:~$ conda activate gypsum_dl_env (gypsum_dl_env) tom@DESKTOP-LG9R7AE:~$ python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/input_f iles/Enamine_Aryl_halides_SNAr/1-1000_5rmm_combinatorial_library_lipinski_filtered.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files/Enamine_Aryl_halides_SNAr --separate_output_files
1) Activate anaconda environment for Gypsum-DL
conda activate gypsum_dl_env
2) Go to Gypsum directory/folder in WSL
cd /mnt/d/gypsum_dl-1.2.1
3) Specify input and output folders and run
python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/input_files/Enamine_Aryl_halides_SNAr/1-1000_5rmm_combinatorial_library_lipinski_filtered.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files/Enamine_Aryl_halides_SNAr --separate_output_files
This repository focuses on the installation and use of Gypsum-DL 1.2.1 built by the durrantlab. Gypsum-DL is a free, open-source program for preparing 3D small-molecule models for molecular docking and virtual screening applications. Beyond simply assigning atomic coordinates, Gypsum-DL accounts for alternate ionization, tautomeric, chiral, cis/trans isomeric, and ring-conformational forms often ignored by other programmes such as Open Babel.
It is released under the Apache License, Version 2.0 (see LICENSE.txt) and offers a free alternative to open babel with improved docking accuracy.
The original repository can be found here: https://github.com/durrantlab/gypsum_dl
Please note: this repository is an application of the excellent work done by the Durrant group to develop Gypsum to illustrate how non-computer savvy users (i.e. chemists like myself) can apply this useful tool to your drug discovery projects!
I would encourage you to read the relevant publications for Gypsum-DL to understand its benefits, which can be found here:
Ropp, Patrick J., Jacob O. Spiegel, Jennifer L. Walker, Harrison Green, Guillermo A. Morales, Katherine A. Milliken, John J. Ringe, and Jacob D. Durrant. (2019) "Gypsum-DL: An Open-source Program for Preparing Small-molecule Libraries for Structure-based Virtual Screening." Journal of Cheminformatics 11:1. doi:10.1186/s13321-019-0358-3.
Ropp PJ, Kaminsky JC, Yablonski S, Durrant JD (2019) Dimorphite-DL: An open-source program for enumerating the ionization states of drug-like small molecules. J Cheminform 11:14. doi:10.1186/s13321-019-0336-9.