PyRx 0.8 / AutoDock Vina Virtual Screen of de novo generated core compounds (non-N-oxides)

TomkUCL commented 7 months ago

Here I have described the current strategy regarding the retained diamine cores from the de-novo generated compounds since deprioritising the N-oxide series from which these cores originated.

Peter Brown enumerated these diamine cores using purchasable carboxylic acids from Enamine, giving ~93'000 compounds per core; @kipUNC has received this for virtual screening using Maestro GLIDE.
Simultaneously, I have been filtering these compounds based on their predicted physiochemical properties using Data Warrior https://doi.org/10.1021/ci500588j and then running a smaller-scale virtual screen (~1000 compounds/day) using open-source software PyRx 0.8 https://sourceforge.net/projects/pyrx/files/0.8/ , which uses the Autodock Vina scoring function to calculate free-binding energies, to prioritise for purchasing / synthesis. This process is described in more depth below.
A key question for prioritising compounds based on in-silico methods for the helicase which conformation of nsp13 should be used for docking?
Based on the 2022 paper from D Shaw's team "Ensemble cryo-EM reveals conformational states of the nsp13 helicase in the SARS-CoV-2 helicase replication–transcription complex" https://www.nature.com/articles/s41594-022-00734-6 . The four conformational states of the nsp13–RTC are:

nsp13.1-apo: the helicase is not bound to RNA
nsp13.1-engaged: the helicase is bound to RNA but not translocating
nsp13.1-swiveled: the helicase is bound to RNA and translocating
nsp13.1-backtracked: the helicase is bound to RNA and backtracking

These states are thought to regulate the RNA synthesis and proofreading of the SARS-CoV-2 virus.

Previously I was docking with the apo-protomer nsp13 (pdb 5rm9), however there are numerous cryoEM structures of nsp13 as part of the replicase complex both with ATP/(+)ssRNA (pdb 7rdy) and without (pdb 7rdz).

TomkUCL commented 7 months ago

The process so far:

We are interested in retaining these cores whilst moving away from the N-oxide parent-compounds since the 51 purchased Enamine N-oxide set afforded no hits by ATPase/SPR assay at UNC. Therefore, we are diversifying the functional groups on either end of the diamine cores for docking.
Following our recent meeting concerning virtual libraries for the UCL cores, Peter Brown (@toluene44) went back and picked 316 diverse acids available from Enamine Building Blocks and created 10 libraries of about 93,000 compounds each. The link to download the folder is below. https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dropbox.com%2Fscl%2Ffo%2Flrf09mdh3qoe5k2006okp%2Fh%3Frlkey%3Dvwp7vounwat5h2l5zqmczdjf9%26dl%3D0&data=05%7C01%7Cthomas.knight.21%40ucl.ac.uk%7C389c7b54639143f4427008dbe07af024%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C638350590860482586%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EUovSK4kxi86vPz5jjourKjKNnZA8I94KN8Z8LWdw0E%3D&reserved=0
The 10 rigid (non-linear alkyl) cores arose from the top 100 GLIDE scoring compounds from the de novo generated set, shown here:

I've filtered these compounds in DataWarrior using Lipinski filters and by removing 'nasty or toxic' functional groups to favour promising drug-like molecules as starting points. This process reduced the number of molecules from >93'000 per core to ~20'000 per core.
Next, I chose 1000 compounds based on 'structural diversity' such that compound 2 is more diverse than 1, and 3 is more diverse than 1 & 2, etc.
These 1000 'most diverse' compounds were then subjected to a virtual screen using the open-source virtual screening tool PyRx 0.8 with the AutoDock Vina 1.2 scoring function. References: Autodock Vina (https://vina.scripps.edu/, https://doi.org/10.1021/acs.jcim.1c00203, https://doi.org/10.1002/jcc.21334, https://doi.org/10.1021/acs.jcim.8b00545). PyRx 0.8 (https://pyrx.sourceforge.io/ , Small-Molecule Library Screening by Docking with PyRx. Dallakyan S, Olson AJ. Methods Mol Biol. 2015;1263:243-50., https://www.nature.com/articles/s41598-021-83626-x , https://www.nature.com/articles/s41598-020-60221-0 , https://doi.org/10.1016/B978-0-12-822312-3.00019-9)

For the docking, the search grid was restricted to the RecA and 1B domains (containing ATP and RNA binding sites) to increase docking speed whilst exhaustiveness was set at 8 to maximise conformation exploration.
I previously docked to apo-Nsp13 protomer (pdb 5rm9, i.e. not part of the replicase complex and in the absence of ATP or ssRNA).
I wanted to try docking to the nsp13 engaged-RTC state (ATP/ssRNA bound pdb 7kro), since this is the closed form of the protein and molecules will be ATP/ssRNA competitive in vivo; this also fits the "molecular staple" idea that we discussed in the last meeting.

The helicase protein was prepared using Dock Prep in UCSF Chimera 1.6 and the ATP and ssRNA were removed from the structure to make these sites available for binding.
Exhaustiveness was run at 8 conformers per compound, then the compounds were ranked by their average binding energy. The poses were then visually inspected and validated using Biovia Discovery Studio 2021.

After visual inspection of the top-scoring poses, the most promising poses will be sent to Geoff Wells to obtain~500 nanosecond AMBER molecular dynamics simulations.

TomkUCL commented 7 months ago

I ran this process for Core 6 (@qxsml / Andy's) core since the route to the mono-Boc protected diamine core seems to be relatively simple. Unfortunately the screen crashed after compound 726/1000, so I have only included these results.
The datawarrior file can be found here: https://drive.google.com/file/d/11W11d40R7SAbVLwOzM_I_ZLTwUvdJ_fz/view?usp=drive_link
The Excel file for each binding pose and teh associated binding energy can be found here: https://docs.google.com/spreadsheets/d/1ki3RNRQ3Q8lTqnXZJl03GibxwsCvG8y2/edit?usp=drive_link&ouid=117232601769274897551&rtpof=true&sd=true

I selected the compounds with an average binding energy of -8.0 kcal/mol or lower. I have included the structures and Enamine codes for these compounds for purchasing.

https://drive.google.com/file/d/1ZEU3J7N_IIH_z02WHENOavddIVRqODHA/view?usp=drive_link

Amongst the compounds with an average binding energy of -8.0 kcal/mol or lower there are some common motifs which are included here, including some binding modes:

Core6 PyRx results.pdf

TomkUCL commented 7 months ago

Summary of results for PyRx/Vina virtual screen for Core 1 Enamine carboxylic acids:

PyRx Core1 results.xlsx
DataWarrior file ; https://drive.google.com/file/d/1q7pRE7NVuZc16h6n_c2TYlELAxltDRiX/view?usp=drive_link
Docked against apo-nsp13 (non-engaged state, no ATP or RNA bound) as part of the replicase complex (pdb 7rdz).
745 / 1000 ligands were successfully screened; the remaining ligands are being docked separately due to errors occurring in the larger run.
The lowest energy poses (~-10.0 kcal/mol) are situated almost exclusively at the 5' ssRNA entry channel (Rec2A/stalk/1B domain interface).
The compound with the lowest average binding free energy (compound 398, -9.64 kcal/mol) is part of a 18-member low-scoring cluster (neighbour tree) of similar compounds in the north-west portion of the clustering map.

I manually re-docked compound 398 (-9.8 to -9.1 kcal/mol) in Chimera 1.16/Vina to check the binding modes; these are focused within and at the 5' entrance to the ssRNA channel.

One pose involves pi-anion stacking with Asp534, which has been implicated in the stabilisation of ssRNA states during RNA unwinding by the helicase and forms part of the conserved motif V (yellow), which is thought essential in allosteric communication between the ATP and RNA sites. Role of ATP in the RNA Translocation Mechanism of SARS-CoV-2 NSP13 Helicase | The Journal of Physical Chemistry B (acs.org)

I've highlighted this interaction in an available cryoEM structure (pdb 7xcm).

A different compound (compound 62, -9.5 kcal/mol) shows binding at the ATP site and could be a starting point for specific ATP-competitors.

Based on the above I propose purchasing the Enamine building blocks required to make compound 398 and subsequent series, which I will find and upload here shortly.

TomkUCL commented 7 months ago

I have redocked the same Core1 library but this time to the close/engaged form of the protein as part of the replicase-complex (pdb 7kro).
This is Geoff Well's extended protein model used for MD simulations (i.e. not truncated at the C-terminus). Geoff has agreed to run a few hundred nano-second simulations of the top-scoring molecules to prioritise them for purchasing.
Raw results and average binding affinities: Core1 PyRx 0.8 Vina virtual screen results.xlsx Core1 PyRx 0.8 Vina virtual screen results - average binding energies .xlsx
I have taken the average binding energy for each pose. I set the cut-off at -10 kcal/mol and then visually inspected the poses to validate them where green = valid pose, red = invalid pose (cis-amide geometry).

The datawarrior file including structures, SMILES, binding scores and Enamine codes is found here; https://drive.google.com/file/d/1UBnHn44DYAkgp4WZUh9fuMk6rGDuZkFy/view?usp=drive_link

Here are some example binding poses: https://drive.google.com/file/d/1-_mpVF4wt9-u8MIU5IKqPXHM16-7d59Q/view?usp=drive_link

Compounds were checked for PAINS compounds using SwissADME, where all compounds were PAINS-free. http://www.swissadme.ch/index.php#

In summary, of the 463 Core1-based compounds virtually screened in the first run, the most promising compounds based on the above are these;

Geoff will run Amber MD on these in next 1-2 weeks. @tmw20653 @toluene44 @mattodd keen to get your thoughts on these compounds for reagent purchase.

mattodd commented 6 months ago

TomkUCL commented 6 months ago

Here is my completed analysis of the Core1 Enamine carboxylic acids enumerated library, including the docking of compounds, plans for synthesis and starting materials needed:

Core 1 enumerated library virtual screen analysis slides

In summary, I have prioritised 15 compounds for reagent purchasing based on virtual screen results (PyRx/ Vina) against two extreme nsp13 conformers, 7kro and 7rdx.

These compounds have gone to Geoff for MD analysis. All thoughts are welcome.

toluene44 commented 6 months ago

Why wait for MD results—just make those 15!

Peter

[Image] http://www.target2035.net/ Peter J. Brown, Ph.D., CChem, MRSC | Chemical Probes Structural Genomics Consortium @.**@.> @.**@.> | www.thesgc.orghttp://www.thesgc.org/ [twitter icon]https://twitter.com/thesgconline [youtube icon] https://www.youtube.com/channel/UCpl3xd4P7aYedOg6uw53hpg [linkedin icon] https://www.linkedin.com/company/structural-genomics-consortium-sgc-/mycompany/

“Target 2035 will create the pharmacological tools needed to study the entire proteome”. [PMID: 31278990https://pubmed.ncbi.nlm.nih.gov/31278990/]

From: Tom Knight @.> Date: Wednesday, December 13, 2023 at 8:41 AM To: StructuralGenomicsConsortium/CNP4-Nsp13-C-terminus-B @.> Cc: Brown, Peter J @.>, Mention @.> Subject: Re: [StructuralGenomicsConsortium/CNP4-Nsp13-C-terminus-B] PyRx 0.8 / AutoDock Vina Virtual Screen of de novo generated core compounds (non-N-oxides) (Issue #43)

Here is my complete analysis of the Core1 Enamine carboxylic acids enumerated library, including the docking of compounds, plans for synthesis and starting materials needed:

Presentation 2.pdfhttps://github.com/StructuralGenomicsConsortium/CNP4-Nsp13-C-terminus-B/files/13661277/Presentation.2.pdf

These compounds have gone to Geoff for MD analysis. All thoughts are welcome.

— Reply to this email directly, view it on GitHubhttps://github.com/StructuralGenomicsConsortium/CNP4-Nsp13-C-terminus-B/issues/43#issuecomment-1853940336, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2B2Z7IKXHVTCIZC7U6K4R3YJGWATAVCNFSM6AAAAAA75SUYQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTHE2DAMZTGY. You are receiving this because you were mentioned.Message ID: @.***>

TomkUCL commented 6 months ago

@toluene44 I will order the starting materials today. I will see where these compounds score in @kipUNC Glide screen and hopefully there will be some overlap.

Could you provide a brief description below of how you created your enumerated libraries for open science-ness purposes, please?

StructuralGenomicsConsortium / CNP4-Nsp13-C-terminus-B

PyRx 0.8 / AutoDock Vina Virtual Screen of de novo generated core compounds (non-N-oxides) #43