TomkUCL / SARS-CoV-2-Helicase-nsp13-Public-Antivirals-Virtual-Screening-Project

A repository for public suggestions towards SARS-COV-2 helicase antivirals using publicly-available software.
Apache License 2.0
0 stars 0 forks source link

Using an open-source program (Gypsum-DL) for preparing small-molecule libraries for structure-based virtual screening #2

Open TomkUCL opened 4 months ago

TomkUCL commented 4 months ago

This repository focuses on the installation and use of Gypsum-DL 1.2.1 built by the durrantlab. Gypsum-DL is a free, open-source program for preparing 3D small-molecule models for molecular docking and virtual screening applications. Beyond simply assigning atomic coordinates, Gypsum-DL accounts for alternate ionization, tautomeric, chiral, cis/trans isomeric, and ring-conformational forms often ignored by other programmes such as Open Babel.

It is released under the Apache License, Version 2.0 (see LICENSE.txt) and offers a free alternative to open babel with improved docking accuracy.

image

The original repository can be found here: https://github.com/durrantlab/gypsum_dl

Please note: this repository is an application of the excellent work done by the Durrant group to develop Gypsum to illustrate how non-computer savvy users (i.e. chemists like myself) can apply this useful tool to your drug discovery projects!

I would encourage you to read the relevant publications for Gypsum-DL to understand its benefits, which can be found here:

Ropp, Patrick J., Jacob O. Spiegel, Jennifer L. Walker, Harrison Green, Guillermo A. Morales, Katherine A. Milliken, John J. Ringe, and Jacob D. Durrant. (2019) "Gypsum-DL: An Open-source Program for Preparing Small-molecule Libraries for Structure-based Virtual Screening." Journal of Cheminformatics 11:1. doi:10.1186/s13321-019-0358-3.

Ropp PJ, Kaminsky JC, Yablonski S, Durrant JD (2019) Dimorphite-DL: An open-source program for enumerating the ionization states of drug-like small molecules. J Cheminform 11:14. doi:10.1186/s13321-019-0336-9.

TomkUCL commented 4 months ago

Getting Started:

1) Install Ubuntu Desktop (Linux command line interface) onto your computer https://ubuntu.com/download 2) Install anaconda for Linux-x86 https://www.anaconda.com/download https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh 3) Make a new folder (directory) in your Ubuntu terminal, e.g. cd /mnt > ls > cd d > mkdir gypsum_dl-1.2.1 > (> = press enter button) 4) Enter the new directory you have just created; cd gypsum_dl-1.2.1 > 5) Save your ligands.sdf file into the current directory 'gypsum_dl-1.2.1'. 6) Install the necessary third-party libraries (rdkit, scipy, and numpy) needed to run Gypsum-DL: conda install -c rdkit rdkit numpy scipy mpi4py

If rdkit does not install, try the install command from their own rdkit repository $ conda create -c conda-forge -n my-rdkit-env rdkit

6) Install Gypsum-DL-1.2.1 from the GitHub repository - https://github.com/durrantlab/gypsum_dl

7) Create and activate a new conda environment within the current directory using the Linux command line in Ubuntu:

conda create -c conda-forge --name gypsum_dl_env rdkit numpy scipy mpi4py -y

conda activate gypsum_dl_env

8) Run gypsum_dl using your specified input file path:

python run_gypsum_dl.py --source ./examples/sample_molecules.smi

For example, in my folder D drive containing folder gypsum_dl 1.2.1 > subfolder sdf_input_files > subfolder 505_selection_of_comb_lib_1.sdf I would type the following into the Linux command line:

python run_gypsum_dl.py --d/gypsum_dl-1.2.1/sdf_input_files/505_selection_of_comb_lib_1.sdf"

In my case I have also specified the output file location as "3D output files".

(gypsum_dl_env) tom@DESKTOP-LG9R7AE:/mnt/d/gypsum_dl-1.2.1$ python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/sdf_input_files/50_selection_of_comb_lib_1.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files --separate_output_files

TomkUCL commented 4 months ago

Saving your Combinatorial Library Products as SMILES (.smi) File for Gypsum-DL Processing:

  1. First, create smiles strings for your product structures using the Chemistry drop-down menu in Datawarrior:

image image

2. Select the SMILES strings for you compounds you wish to use by selecting the column in datawarrior (shift + left click). Then copy and paste the SMILES column into a new Excel spreadsheet.

image

3. Next, copy and paste the SMILES into a new Excel spreadsheet and save as a new .txt file.

image

image

4. Open the .txt file, then save as file type 'All files', then delete the .txt extension on the file name and instead save as a .smi file

image

5. Lastly, run Gypsum-DL using the following command prompt;

python run_gypsum_dl.py --source YOUR FILE LOCATION

So in my case, this would be...

python run_gypsum_dl.py --source d/gypsum_dl-1.2.1/sdf_input_files/50_selection_of_comb_lib_1.smi

..or if you wanted each model (e.g. tautomer) generated for each ligand to be stored as its own .sdf file in the output folder '3D_output_files' within the current directory, the command I would type would look like this:

_(gypsum_dl_env) tom@DESKTOP-LG9R7AE:/mnt/d/gypsumdl-1.2.1$ python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/sdf_input_files/50_selection_of_comb_lib_1.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files --separate_output_files

TomkUCL commented 4 months ago

Running Gypsum:

image

image

image

image

obabel -isdf YOURLIGANDFILE.sdf -osdf -O *.sdf --split

e.g.

obabel -isdf gypsum_dl_success.sdf -osdf -O *.sdf --split

image

image

TomkUCL commented 4 months ago

Converting Your .sdf Output Files to .pdbqt for Virtual Screening

image

image

awk '/^MODEL/{n++}{print > output_dir "output_prefix" n ".pdbqt"}' "input_dir/ligands.pdbqt" && csplit --suppress-matched "input_dir/ligands.pdbqt" '/^MODEL/' '{*}' && rm xx*

awk '/^MODEL/{n++}{print > "model" n ".pdbqt"}' "/mnt/d/gypsum_dl-1.2.1/3D_output_files/250_ligands.pdbqt" && csplit --suppress-matched "/mnt/d/gypsum_dl-1.2.1/3D_output_files/250_ligands.pdbqt" '/^MODEL/' '{*}' && rm xx*

image

image

TomkUCL commented 4 months ago

Lastly, we need to remove the 'MODEL' line in each .pdbqt file, otherwise Vina will read each file as containing multiple models and will be unable to dock these. We can do this using a python script:

import os

def delete_first_line(file_path):
    # Read the content of the file
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Remove the first line starting with 'MODEL'
    lines = [line for line in lines if not line.startswith('MODEL')]

    # Write the modified content back to the file
    with open(file_path, 'w') as file:
        file.writelines(lines)

def process_files_in_directory():
    # Get the current directory
    directory = os.getcwd()
    # Iterate through each file in the directory
    for filename in os.listdir(directory):
        if filename.endswith('.pdbqt'):
            file_path = os.path.join(directory, filename)
            # Delete the first line starting with 'MODEL'
            delete_first_line(file_path)

if __name__ == "__main__":
    process_files_in_directory()
    print("Processing complete.")

Save this script as _delete_model_linepdbqt.py in the directory containing your .pdbqt files. Then, open a terminal, navigate to the directory containing both the script and the .pdbqt files, and execute the script:

python3 delete_model_line_pdbqt.py

This will process each .pdbqt file in the current directory, deleting the first line starting with 'MODEL' from each file. After processing, it will print "Processing complete."

image

Now that we have our ligands as individual .pdbqt files, we can now move onto the docking process using AutoDock Vina in Issue #1.

TomkUCL commented 2 months ago

(base) tom@DESKTOP-LG9R7AE:~$ conda activate gypsum_dl_env (gypsum_dl_env) tom@DESKTOP-LG9R7AE:~$ python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/input_f iles/Enamine_Aryl_halides_SNAr/1-1000_5rmm_combinatorial_library_lipinski_filtered.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files/Enamine_Aryl_halides_SNAr --separate_output_files

image

TomkUCL commented 2 months ago

1) Activate anaconda environment for Gypsum-DL conda activate gypsum_dl_env

2) Go to Gypsum directory/folder in WSL

cd /mnt/d/gypsum_dl-1.2.1

3) Specify input and output folders and run

python run_gypsum_dl.py --source /mnt/d/gypsum_dl-1.2.1/input_files/Enamine_Aryl_halides_SNAr/1-1000_5rmm_combinatorial_library_lipinski_filtered.smi --output /mnt/d/gypsum_dl-1.2.1/3D_output_files/Enamine_Aryl_halides_SNAr --separate_output_files

image

image