Consensus in silico target prediction modeling for Series 3

spadavec commented 7 years ago

Overview

Although Series 4 is the current de facto series of interest, there seem to be some open questions regarding series 3 (S3). For example, it is still unknown which target is hit by series 3 compounds (previous attempts at finding MoAs was focused primarily on arylpyrrole compounds [1] which primarily eliminated PfATP4 as a target, along with a number of predicted targets [Q8I3U9,Q08210, Q8I553, Q8IHS2, P61075]). As such, I've followed my interest here to see if something could be born out in an in silico approach to provide some leads for potential targets.

Given the number of synthesized S3 compounds, I wanted to develop a consensus model of likely targets for S3; the individual steps for this approach are detailed in this workbook, but for brevity sake (yet somehow still very long winded!), I will only give an overview here.

Collection of organism agnostic S3 targets

First, all the S3 compounds were collected and run through a pre-compiled ChEMBL naive-bayes targets prediction model. Predictions were made for each compound at both the 1 and 10microM levels, and targets with at least 80% confidence (of being a hit) were retained. All target hits for the 10microM level can be found here, and the 1microM level here. A sampling of the most represented targets at both levels can be seen below:

CHEMBL_ID	Organism	Target Name	Target Count	UNIPROT	Function	Redundant?
CHEMBL4462	Homo sapiens	NAD-dependent deacetylase sirtuin 2	23	Q8IXJ6	Deacetylase	No
CHEMBL5554	Homo sapiens	Phosphatidylinositol-4-phosphate 3-kinase C2 domain-containing beta polypeptide	21	O00750	Kinase	No
CHEMBL4226	Homo sapiens	Dual specificity protein kinase CLK3	19	P49761	Kinase	No
CHEMBL4461	Homo sapiens	NAD-dependent deacetylase sirtuin 3	19 (23)	Q9NTG7	Deacetylase	Yes
CHEMBL4506	Homo sapiens	NAD-dependent deacetylase sirtuin 1	19	Q96EB6	Deacetylase	No
CHEMBL4224	Homo sapiens	Dual specificty protein kinase CLK1	18 (24)	P49759	Kinase	Yes
CHEMBL4211	Helicobacter pylori	Carbonic anhydrase 1	17 (17)	O24855	Dehydratase	Yes
CHEMBL3886	Homo sapiens	Mixed lineage kinase 7	16	Q9NYL2	Kinase	No
CHEMBL4367	Homo sapiens	Tyrosine-protein kinase TXK	16	P42681	Kinase	No
CHEMBL5650	Homo sapiens	BR serine/threonine-protein kinase 1	16	Q8TDC3	Kinase	No
CHEMBL5260	Homo sapiens	Serine/threonine-protein kinase TNNI3K	15	Q59H18	Kinase	No
CHEMBL5973	Mus musculus	Carbonic anhydrase 15	15 (18)	Q99N23	Dehydratase	Yes
CHEMBL1944499	Astrosclera willeyana	Astrosclerin-3	15 (15)	A6YCJ1	Dehydratase	Yes
CHEMBL3009	Homo sapiens	Receptor protein-tyrosine kinase erbB-4	14	Q15303	Receptor Tyrosine Kinase	No
CHEMBL5819	Homo sapiens	Serine/threonine-protein kinase Nek4	13	P51957	Kinase	No

Table 1 -- Most represented targets for both 1- and 10-microM assay levels of naive bayes target search. If a targets was found at both 1- and 10-microM levels, it will be marked 'Yes' under 'Redundant?' column, with second count available in parenthesis in 'Count' column,

Given the adenine-like thienopyrimidine core of S3, its rather unsurprising to see the glut of kinases and NAD-dependent targets in this list (compiled list found here). By shear volume, the most represented structure at both levels are three related sirtuin targets which have been implicated as part of plasmodium var gene silencing/regulation [2], potentially though hyrolysis of medium and long chain fatty acyl groups from lysine resiues [3].

Search for Plasmodium falciparium 3D7 target analogs

Given that all of these targets were found in (largely) Homo sapiens, these results need to be translated to Plasmodium falciparum to be useful. Operating under the assumption that our Homo sapien targets share some structural similarity to putative Pf targets, we can search for Pf targets that share high sequence identity and further validate those targets.

The top 7 most represented targets (CHEMBL-4224, -4461, -5973, -1944499, -4211, -4266, and -5554) (admittedly a rather arbitrary cutoff), were then used as templates for plasmodium 3d7 blastp searches. All of the results can be found here. A subset of sequences showing the highest similarity/identity (albeit still relatively modest), are shown below:

Query ChEMBL ID	Query Protein Name	3D7 Match Name	Score	E	Identity	Positives	PDB
ChEMBL4224	Dual specificty protein kinase CLK1	PF3D7_1445400 (CLK1)	230	2.00E-66	33%	53%	3LLT
CLK3 (PF3D7_1114700)	125	1.00E-30	27%	46%
GSK3 (PF3D7_0312400)	112	4.00E-27	27%	47%
PK7 (PF3D7_1337100)	105	2.00E-25	25%	45%
MAPK1 (PF3D7_1431500)	104	9.00E-24	27%	48%
SIRA (PF3D7_1328800)	95.5	2.00E-22	29%	48%	3U31, 3JWP
ChEMBL5554	PI43K C2 domain-containing beta polypeptide	PI3K (PF3D7_0515300)	131	2.00E-30	33%	50%	4UWL
PI4K (PF3D7_0509800)	88.6	2.00E-17	28%	51%	4WAG

Table 2 -- Subset of blastp search results for plasmodium analogs of homo sapien targets. Identities and positive rates are for whole sequence alignment, and PDB column is filled for results that have corresponding PDB structure in Pf.

A number of predictions and alignments were found for CLK3 were found to be similar to the CLK1 results and were left out of Table 2. Overall, the most promising results are CLK1/CLK3 and PI3K/PI4K, with modest blastp and E scores, along with the existence of PDB structures for both.

CLK1 / CLK3

One of the best overall match across all of the targets was CLK1 (PF3D7_1445400), a CMGC Serine/Threonine kinase. Previous work has been conducted on CLK1, which indicated that it is primarily localized in the plasmodium nucleus and plays a crucial role in regulating mRNA splicing and has been identified as a potential kinase target for the GSK set of compounds [4], and has been shown to be responsive to imidazopyridazine derivatives [5]. Overall, the whole-sequence identity between PfCLK1 and CLK1 is low (33%) and marginal positive match percentages (53%), but shows a general agreement (~0.9A RMSD) in tertiary structure (Figure 1).

clk1-all-align Figure 1 -- Alignment of PfCLK1 (PDB: 3LLT, Purple) and human CLK1 (PDB: 2VAG, Cyan).

Within the ChEMBL database, the compounds which have IC50 data against human CLK1 (with the S3 core as a substructure) show the well known tricyclic kinase inhibitor motif with a thiophene ring (eg below)

matchers

Currently, the ChEMBL database for hCLK1 has ~286 compounds for which IC50 data is available (available here). Using these compounds, a linear regression on ECFP4 fingerprints with the relevant IC50 data was performed, testing for the ability to approximate IC50 for a handful of S3 compounds (OSM-S-106, OSM-S-132, OSM-S-136, and OSM-S-137). For the test compounds, the potencys are labeled as "IC50" against Pfal, but these should likely be labeled as EC50, but will be referenced as IC50 predictions for consistency sake.

Overall, the approximations seem to bare out the assumption that hCLK1 is receptive to S3 compounds, and that the plasmodium equivalent could be a potential target, with one notable exception.

ChEMBL ID	Actual IC50	Predicted IC50	Fold Difference
OSM-S-106	0.08	0.16	2
OSM-S-132	~40	0.2	200
OSM-S-136	15	2.8	5.3
OSM-S-137	1.7	7.2	4.2

Table 3 Potency prediction for S3 compounds using hCLK1 reference compounds

While a 2-5x fold difference in potency prediction is acceptable in most cases, a ~200x fold difference is not. Interestingly, the largest margin of error for prediction coincided with the more fragment-like compounds (e.g. OSM-S-129 - OSM-S-134) that have smaller R groups at 1 site:

ChEMBL ID	Actual IC50	Predicted IC50	Fold Difference
OSM-S-129	45	0.3	150
OSM-S-132	45	0.1	450
OSM-S-136	45	0.3	150
OSM-S-137	45	0.33	135

Table 4 Potency prediction for S3 compounds using hCLK1 reference compounds

Here, all of the actual IC50s were marked as ">40", but were labeled as 45 here for comparison. In the case that the S3 potency data is EC50 data, this large difference could be attributed to any number of mechanisms that prevent these fragments from reaching CLK1 and achieving the predicted potency. Alternatively, this could be an artifact of the approach itself, resulting from the low bit-density in the fingerprints that are being removed from regression due to a high drop-out rate (which I will be investigating next).

Edit: 6/3/2017

To further evaluate pCLK1, given the existence of a crystal structure, I looked at whether the docking of S3 compounds into PDB 3LLT could provide further insight (Full procedure reported [here]())

The self-docking of the ANP (an ATP analog) in the binding pocket produced decent results. The top predicted pose had the adenine head group rotated by ~180 degrees (see below), although the general shape conformed to what is seen in the xtal structure; the predicted score was -9.2, which is supposed to be a dG approximation if you squint really hard. In fact, none of the top-10 predicted poses had the adenine moiety in the xtal structure orientation;

clk1_anp_selfdock

Figure 2 Self-docking of ANP into pCLK1. Top predicted pose in cyan, and xtal pose in purple.

However, the top pose did recreate the xtal structure h-bond to the backbone carbonyl of GLU631, and other sulfonyl interactions with more solvent-exposed residues.

With a mild trepidation, I decided to move forward and dock all of the non-fragment-like S3 compounds to see if (A) the EC50 trends could also be replicated, (B) whether the S3 core was found to make similar interactions (or if there were any consistent binding modes), and (C) thus determine if this structure could be used for a larger HTS campaign or absolute free energy calculations (workbook).

The docking results reveal a number of binding modes that seem to be largely dependent on the presence of a substituent on the exo-amine group of the pyrmidine, and whether the R-group of the thiopehene is hydrophilic or not. When there is no amine R-group and the thiopehene R-group is polar, the binding represents that of ANP (e.g. see below, OSM-S-106).

osm-s-106

Figure 3 predicted binding pose of OSM-S-106 (cyan) in 3LLT; ANP xtal structure in purple. Polar contacts made by OSM-S-106 shown in yellow dashes.

In these cases, the primary interaction then seems to move the backbone nitrogen of LEU633, and not with the external amine of the base, although it is found in a very similar position.

When a substituent is placed on the amine, and a polar thiophene group is present, the S3 core slips out of the pocket but still still represents the ANP binding mode, albeit translate (e.g. OSM-S-127).

osm-s-127

Figure 4 predicted binding pose of OSM-S-127 (grey) in 3LLT; ANP xtal structure in purple.

Finally, if the thiophene group becomes too bulky or more lipophilc, the entire binding pose can flip (see below).

osm-s-142

Figure 5 predicted binding pose of OSM-S-142 (blue) in 3LLT; ANP xtal structure in purple.

Edit: 6/4/2017

The final docking tests for CLK1 revolved around docking of MMV compounds with S3 like cores in pCLK1 to see if known inhibition trends could be replicated. All compounds from the Novartis and GSK datasets (taken from here and here) with the thieno[3,2-d]pyrimidine core were docked into the pCLK1 structure. With enrichment in mind, the top/bottom 5 docking scores/compounds:

ChEMBLID	Docking Score	Inhibition Note
CHEMBL586914	-11.3	97% inhibition @ 2uM
CHEMBL528892	-10.9	97% inhibition @ 2uM
CHEMBL585957	-10.6	~3uM (EC50)
CHEMBL548452	-10.6	91% inhibition @ 2uM
CHEMBL582655	-10.3	~0.75 uM (EC50)
CHEMBL547230	-7.8	94% inhibition @ 2uM
CHEMBL583150	-7.7	No Data
CHEMBL529653	-7.7	97% inhibition @ 2uM
CHEMBL530732	-7.7	84% inhibition @ 2uM
CHEMBL581740	-7.6	86% inhibition @ 2uM

Overall, not a great difference in performance over 'actives' (compounds with scores <-10) and 'moderate hits' (compounds with scores ~ -7). It should be noted that a lot of these compounds don't have the strict S3 core, but something approximate it. All of the scores can be seen here. While investigating this series in the MMV database, I noticed that there were a large number of S3-like cores that had the orientation of the sulfur inverted, using a thieno[2,3-d]pyrimidine (instead of the thieno[3,2-d]pyrimidine). To follow this thread, I also searched the MMV for compounds with this core and docked them as well. The results (including the S3-core scores above), can be found here

Finally, using the MMP method previously discussed here, I generated ~30k new S3 compounds based off OSM-S-106, and retained those cores which have the S3 core (e.g. the core wasn't swapped out). Using the potency predictor mentioned earlier, a number of interesting compounds were predicted to have approximately double digit nM potency:

enumerated

Conclusion for CLK1

Overall, there looks to be evidence in support of CLK1 being a potential target (or at the very least off-target) for S3 compounds. Were a biochemical assay to be carried out to verify this, it would be most prudent to check the most potent compounds from the the current S3 series, along with some of the most (predicted) potent MMV compounds.

Edit: 6/5/2017

Sir2a

The next interesting target in plasmodium is Sir2a, a protein with modest similarity to human Sirtuin 3. It overall shows a 29% identity with Situin 3, and a ~44% positive rate. Sirtuin is a NAD-dependent protein deactylase, which has a major NAD binding region annotated to be 145-165; when performing a blastp on this NAD binding region with pSir2a, the positive/identity rate increases slightly to ~56%. The overall overlap of the two proteins is moderate (~1.5 A RMSD), with the highest degree of disorder being found near the C-terminal end (top of figure).

sir2a-overlap Figure 6 Overlap of hSir3 (orange, PDB:4bn4) and pSir2a (blue, PDB:3JWP).

Focusing in on the NAD binding pocket, the cofactors are found to bind in similar orientation and pose.

sir2a-pocket Figure 7 Overlap of hSir3 binding pocket (orange, PDB:4bn4) and pSir2a (blue, PDB:3JWP).

Structural similarities noted, the first step here is to see if a potency predictor can appropriately predict our S3 compounds using existing hSir3 data. A total of 225 compounds were used to compile the model, and a selection of S3 compounds were tested.

ChEMBL ID	Actual IC50	Predicted IC50	Fold Difference
OSM-S-106	0.08	782	~10,000
OSM-S-132	~40	875	~20
OSM-S-136	15	26428	~1800
OSM-S-137	1.7	514	~300

Yikes. This is what we in the modeling world call 'not predictive', or not great, or put even another way, horrible. Comparing the similarity of OSM-S-106 to all the compounds in the hSir3 dataset, the highest (fingerprint) similarity is ~0.3 (not great)

For the sake of completeness, I checked whether a docking model could help in this respect at all. The self-docking looked to recapitulate many of the key interactions seen in the xtal structure, and had the same pose (take that, clk1!).

sir2a-selfdock Figure 8 Self-docking of AMP (pink) to pSir2a (blue, PDB:3JWP)

Fresh off the defeat off the potency prediction, we can try to dock some of our S3 compounds into this structure and see what is preferred.

The docking of osm-s-106 and osm-s-126 revealed for a preference to occupy a AMP-like pose; the addition of cyclical r-groups to the thiophene and adenine group pushes the 'core' away from the pocket, but doesn't invert like seen in CLK1.

osm106-sir2a Figure 9 Docking of osm-s-106 (pink) to pSir2a (pdb 3JWP)

osm126 Figure 10 Docking of osm-s-126 (pink) to pSir2a (pdb 3JWP)

137 Figure 11 Docking of osm-s-137 (pink) to pSir2a (pdb 3JWP)

However, bulkier compounds like osm-s-137 seem to force the bulky amino-substituents into the adenine pocket, and the thiophene substituents 'outward'

The final piece of investigating Sir2 is looking at the Novartis/GSK compounds being docked into the xtal structure to see if there are any preferred structural motifs, or any relationship between docking scores (in a very rough way) and EC50s. All of the docking scores can be seen here--but the top 5 and bottom 5 scores (of 213 total) are shown here:

ChEMBL ID	Docking Score	Potency Note
CHEMBL602016	-9.3	~1.5 uM (EC50)
CHEMBL578082	-9.2	~4 uM (EC50)
CHEMBL601797	-8.5	~1.1 uM (EC50)
CHEMBL535112	-8.4	94% inhibition @ 2 uM
CHEMBL579765	-8.4	~1.3 uM (EC50)
CHEMBL547230	-5.1	94% inhibition @ 2 uM
CHEMBL536884	-4.9	97% inhibition @ 2uM
CHEMBL581837	-4.0	99% inhibition @ 2uM
CHEMBL581740	-3.5	86% inhibition @ 2uM
CHEMBL533244	-1.9	97% inhibition @ 2uM

Overall, these docking scores are very supportive of the idea that Sir2a isn't the target for these compounds. The range in docking scores should be more than enough to discriminate compounds with ~equipotent EC50s, but in this case, it clearly cannot. Furthermore, docking scores of the worst compounds are very indicative of next-to-no binding, but are showing good potency in cellular assays--this raises the question of whether a metabolite(s) might be actually binding Sir2a instead, but I'll leave that up to someone more knowledgeable.

Conclusion for Sir2a

There seems to be a preponderance of evidence to suggest this is not the target for S3 compounds. If someone is willing and capable to test this, it might be interesting, but were I a betting man, I wouldn't put my money on this proverbial horse.

PI3K

The next interesting (and relatively orthogonal) target is PI3K--which has been implicated in 3D7 amino acid efflux [6] and artemisinin resistance [7]. Currently, there are no crystal structures for PfPI3K, so I will rely on the human analog VPS34, which has a protein-wide 33% identity and 50% positive rate.

There isn't much assay information regarding VPS34 (e.g. class 3 PI3K), so a majority of the evaluation of PI3K will have to come from rigid docking experiments.

Using PDB: 4UWL as a template, the first exercise was to self-dock the existing pyrimidinone inhibitor (workflow available [here]()).

pi3k-selfdock Figure 12 Self-Docking of PDB 4UWL (xtal in orange, docked pose in cyan).

The docked posed recreated the xtal structure very well, recapitulating the morpholine bond to the backbone nitrogen of ILE685, and the bond to the backbone nitrogen of ASP644.

The docked pose of OSM-S-106 and -126 and -127 show very similar trends to what was seen in CLK1 (surprise, surprise). Namely that the pose will adapt an ATP-like pose, so long as the substituents off the external amine or thiophene aren't too bulky. These substitution patterns simply pushe the core out, allowing for (im guessing) primarily VdW interactions to dictate binding.

pi3k-osm106 Figure 13 Docking of OSM-S-106 to PDB 4UWL

125pi3k Figure 14 Docking of OSM-S-126 to PDB 4UWL

osm-137 Figure 15 Docking of OSM-S-137 to PDB 4UWL

Results of docking all the S3-like compounds can be found here; notably, the top 5/worst 5 are:

ChEMBL ID	Docking Score
CHEMBL578082	-10
CHEMBL582655	-9.6
CHEMBL582323	-9.5
CHEMBL547275	-9.4
CHEMBL584233	-9.4
CHEMBL536426	-5
CHEMBL579656	-4.9
CHEMBL581837	-1.8
CHEMBL581740	-1.8
CHEMBL533244	-1.7

Again, these worst-5 docking results are indicative of next-to-no binding, while the best scoring compounds have docking scores that approximate ATP binding scores for kinases. This is a consistent theme across many targets seen so far, suggesting that even among my S3-like compounds, there are multiple targets being hit that are likely not kinases, as the binding pocket is probably too small to accept these larger molecules.

Conclusion for PI3K

There is limited information going into this scenario, so only limited conclusions can be drawn. First, the binding modes exhibited for a subset of S3 compounds show the same trends (e.g. positioning of the s3 core w/ bulky/non-bulky substituents). Secondly, It is likely that PI3K is receptive to some of these S3 like compounds, but isn't being hit by a diverse set of compounds.

PI4K-3B

First, docking experiments were used to verify that this PDB can recreate the xtal structure pose.

pi4k-selfdock Figure 16 Self-docking of PDB: 4WAG

Looks good enough to me!

Docking of our subset of three S3 compounds shows a similar pattern as before, except for in one case!

osm106 Figure 17 Docking of OSM-S-106 to PDB: 4WAG

both Figure 17 Docking of OSM-S-126 (blue) and OSM-S-106 (pink) to PDB: 4WAG

osm-137 Figure 18 Docking of OSM-S-137 to PDB:4WAG

What may be difficult to see here is that PI4K is more receptive to the bulky substituents at both positions seen in OSM-S-137, and places the tiophene near the same position as the OSM-S-106, albeit with the orientation of the sulfur switched.

Docking scores for all of the S3-like compounds can be found here; the best 5/worst 5 can be seen below:

ChEMBL ID	Docking Score
CHEMBL529476	-9.1
CHEMBL531065	-9.1
CHEMBL532284	-9
CHEMBL528892	-8.9
CHEMBL533502	-8.8
CHEMBL584250	-5.4
CHEMBL581740	-5.1
CHEMBL536884	-5
CHEMBL533244	-4.6
CHEMBL529653	-4.3
CHEMBL581837	-1.1

Even with the larger-than-normal binding pocket, we see some really poor binders near the bottom of the list. Perhaps more interesting is that the top binders could probably be, based on docking score alone, be characterized as 'moderate binders'; most likely not the level one would expect from nM level potent compounds. Of course it can be very dangerous to read into these docking scores too much and compare them to EC50 values, but the temptation is very strong!

References

[1] [2] [3] [4] [5] [6] [7]

MFernflower commented 7 years ago

im away fron my linux machine at the momenet - would adding a methyl inbetwixt the two n atoms of the pyrimidine system help stabilize the binding @spadavec Can you upload the pdb of s3 hit bound to the kinase?

spadavec commented 7 years ago

@MFernflower stabilize in which way? Methylating at the 2 position looks to result in the same as extending the external amine--pushing it further from the binding pocket with essentially no change in binding affinity, unfortunately (methyl added in cyan, OSM-S-106 in blue).

methyl

Presuming that we (a) want to keep this core, and that (b) this is the putative binding mode/target, there isn't much space near the core with respect to adding new moieties

surface

Just from a casual look at the pose, it would seem that extending/modifying the sulfyl groups would benefit most, or consider branching off the sulfur directly to access that pocket (apologies for the poor picture, its a bit difficult to find a good angle!).

I've attached the pCLK1.pdb and OSM-s-106 and OSM-S-106-methyl in .mol2 format here

pCLK1.zip

MFernflower commented 7 years ago

@spadavec i am on vacation so no acess to my normal tools... can you dock to this compound? sorry to be a pain!

NC=1C2=C(N=CN1)C=C(S2)C=2C=C(C=CC2)S(=O)(=O)OC

@mattodd looks like tweaking the thiophenylpyrimidine section is a big no-no!

spadavec commented 7 years ago

@MFernflower No worries! This methoxy version looks to dock nearly exactly the same as OSM-S-106 (methoxy in grey, OSM-S-106 in blue)

methoxy

MFernflower commented 7 years ago

@spadavec I hate to bother but I had two more ideas and since swissdock is fighting me tooth and nail for some reason I humbly request you dock the following:

COS(=O)(=O)C1=CC(=CC=C1)C1=CC2=C(O1)C(N)=NC=N2 CC(C)OS(=O)(=O)C1=CC(=CC=C1)C1=CC2=C(S1)C(N)=NC=N2

spadavec commented 7 years ago

@MFernflower Docked the two above compounds. They both showed a flipped orientation relative to OSM--S-106 (which is in cyan). Apologies for it being a bit scatterbrained, but it should show the relative binding modes:

news

(dimethtyl in grey, furan in purple). Interestingly enough, even if you through the amine on the end (e.g. is exactly the same as OSM-S-106, except for the S/O swap), it still will have the (relatively) inverted pose. My guess is that the increased aromaticity of the thiophene is pushing the adenine into the pocket, but I'm not entirely sure why...

even the pyrrolopyrimidine is flipped! These things can be quite fickle, eh?

MFernflower commented 7 years ago

@spadavec indeed very interesting! Looks like the methanesulfonic is the winner! Regarding your last comment about the S/N swap I must quote Jurassic Park: "i'm simply saying that life, uh... finds a way"'

One last element exchange to try! S/Se

mattodd commented 7 years ago

Just a note @spadavec - this is very nice work. Will digest the details ASAP, and I encourage others to give their thoughts here, but are you recommending looking at an experimental assay vs CLK1?

I put in a proposal for Series 3 in MMV's call for proposals about MoA a couple of months back, and am waiting to hear (not sure if I shared that, sorry, it was fast). We might get some support for experimental validation, and this analysis could help inform that.

mbhebhe commented 7 years ago

Thanks @spadavec for all this insightful info. Like Mat said, we will the digest the details ASAP

MFernflower commented 7 years ago

https://cloud.githubusercontent.com/assets/3164942/26793367/be7dcb5c-49eb-11e7-8c19-7fef4be8f9c4.png @spadavec do you think it is worth for dr.todd and co to make the sulfonate since it docks so well?

MFernflower commented 7 years ago

@spadavec http://www.nature.com/nature/journal/vaop/ncurrent/full/nature22337.html Very interesting paper! There kinase inhibitors look a bit like our S3 drugs!

spadavec commented 7 years ago

@MFernflower in my opinion, its probably not worth it just yet to make the sulfonate. Its likely that S3 hits CLK1 to some degree, but whether thats the actual target is less certain--once we hammer that down, doing more structure based work would make more sense. Also, that PI4K paper looks interesting, and is one of the targets that my method dragged up and I will be looking at after I complete my work on Sir2a. Finally, I will definitely try the S/Se swap when my docking experiments for Sir2a finish in a few hours.

@mattodd Regarding a CLK1 assay, I guess I'd say...perhaps? Overall, my intention was to put together 3-5 targets that might merit worth looking at more in depth. Given the number of kinase hits that came up, it might not be worth it to test any one specifically (although if we were, CLK1 would be my first bet) given that the S3 compounds are very likely to be rather promiscuous and hit a few kinases. I'm currently putting together a suite of plasmodium kinase PDB targets that we can dock all of the S3 compounds against and see which targets are most receptive, which would hopefully give us a better idea of which kinase(s) to look at.

MFernflower commented 7 years ago

As always fantastic work vito!

will your screening suite include PI4K?

spadavec commented 7 years ago

@MFernflower Yes; it looks like I should be able to get both PI4K and PI3K; I should have a relatively complete list of kinases by tomorrow evening.

mattodd commented 7 years ago

Great work @spadavec. (For onlookers/OSM first-timers, one of the many very good things about what @spadavec is doing is to include justification for all conclusions by way of links to spreadsheets/lab books. No emails were created or harmed in the course of this work.)

A quick mention about the Series 3 background. As a group we decided to move onto Series 4 as a strategic move - S4 had more associated data and proven in vivo activity. By contrast S3 had proven challenging to functionalise without loss of potency. But S3 remains very interesting. The quality of the original hit, OSM-S-106, is striking. In response to MMV’s call for proposals around discovering mechanism of action I submitted S3 (file)

MoA Proposal OSM Series 3.xlsx

...since this required only a summary of key data about the relevant compound - my bad for not sharing that file before. I’m hoping MMV agree to look at the MoA for us because OSM-S-106 is a beauty. @mbhebhe has started a PhD recently and can look at S3 and pursue the most interesting lines of inquiry that we as a community determine. It’s always more productive to pursue synthetic targets with strong rationale, and @spadavec 's analysis provides that. We had been thinking about inverting the thieno ring, for example (so, S facing south instead of north) and this has popped up in the analysis. We had also assumed we needed to make the furan and pyrrole analogs of the thiophene, but the analysis cautions against that (with obvious caveats). We might target one of these to see if the prediction holds. With some more work we'd be in a good position to publish everything - not quite there yet since all potencies fell off the cliff - and pursue the series more generally.

I understand you’re still going on this @spadavec (particularly the pSir2a docking, and PI4K/PI3K are interesting targets) but I wanted to comment on formulating some specific to do items now, given that @mbhebhe is well-placed to act, experimentally, on your predictions. We have plenty of 106 to ship to whomever. (@mbhebhe - if any of the below things are large and complex to deal with, you can start new Issues, referencing this one) So (and anyone else feel free to chip in, here, obi).

1) The CLK1 crystal structures in the PDB - who did these and could we work with them? @mbhebhe

2) Similar compounds in the GSK set (ref 4 above) - I’d thought we had extracted all the most relevant, similar compounds. Are there some more in there that we might consider as being S3? I guess we never documented this search properly on the wiki.

3) Table 3 is most interesting, and the outlier could indeed be there for reasons not associated with drug-protein interactions. My first thought here, too, is that we still have relatively few actives in S3, meaning 3 out of 4 is pretty good.

4) The docking studies obviously successfully predict the sensitivity of the potency of the S3 compounds to variation in substitution pattern of the primary amine and the nature of the Ar ring. That's most promising.

5) The structures proposed as possible synthesis targets: @mbhebhe is looking into these for feasibility/precedence. Some very nice suggestions there, both simple (open-chain functionalised thiophene) and challenging (requiring direct thiophene-heterocycle couplings). Let’s see which are the lowest-hanging fruit, unless there is a reason to prefer one over another?

6) @mbhebhe - who is running, or has recently run, a CLK1 assay (human or Pfal)? It might be possible to engage Novartis on the PI4K/PI3K possibilities if they remain interesting.

7) Should we bite the bullet and run a commercial kinase scan? Although I guess the kinase profiling in the 2017 paper you found was probably done by ManRos therapeutics.

MFernflower commented 7 years ago

If the kinase scan can do both human and malaria kinases and is a good price than I would recommend doing it. there are always other in-vitro assay companies if price is a problem!

Interestingly, the furan compound may be commercially available - http://www.achemblock.com/furo-2-3-d-pyrimidin-4-amine.html The site does look very unpolished so no clue if this is a legit company but still interesting nonetheless! The ring system does not look that hard to make however so it might be worth the extra man-hours?

@mattodd

mbhebhe commented 7 years ago

Once again, thanks @spadavec for doing all this work. You attached links to some google docs, could I please have permission to view them, thanks!

A while back, I came across a few patents that made molecules with the S3 core but different substituents/groups attached to the core. When I looked at them then, they were of no interest but now they might as they look similar to @spadavec 's predicted molecules. I will compile a list and upload soon (bit busy in the lab at the moment). But these similar molecules were useful as TIE-2 and/ or VEGFR-2 inhibitors, useful for modulating the activity of lipid kinases including PI3K, and for treating disorders such as cancer mediated by lipid kinases. An some were EP2 receptor modulators.

The above molecule is possible to make. Will look into the others. Some looked challenging to synthesise, but will do more searching

People who can do PfCLK1 assays:

Laurent Meijer (meijer@manros-therapeutics.com), paper: Exploration of the imidazo[1,2-b]pyridazine scaffold as a protein kinase inhibitor (http://dx.doi.org/10.1016/j.ejmech.2016.09.064)
Maybe these guys, paper: Two Nucleus-Localized CDK-Like Kinases With Crucial Roles for Malaria Parasite Erythrocytic Replication Are Involved in Phosphorylation of Splicing Factor (DOI 10.1002/jcb.23034)

MFernflower commented 7 years ago

looks like our magic bullet is related to a commercial CLK blocker! https://www.caymanchem.com/product/10398

can we send some of this off for antimalarial potency testing?

Update: 6/12/2017 I wonder if making a molecule with features of both molecules would be advantageous? 9838384uy483y4

drc007 commented 7 years ago

Is that the most stable tautomer? I would've expected temp

MFernflower commented 7 years ago

That is what a late night does to you! you are 100% right @drc007

mbhebhe commented 7 years ago

@MFernflower @drc007 The above molecule is possible to make, I did some searching and it can be made from these two methods: from http://onlinelibrary.wiley.com/store/10.1002/anie.201403082/asset/supinfo/anie_201403082_sm_miscellaneous_information.pdf?v=1&s=373324c86f801e598b6d9c18bc63627320ee5f4c

from http://www.journal.csj.jp/doi/pdf/10.1246/bcsj.59.3097

We have to use the amino-thienopyrimidine without the Br as the starting material

mattodd commented 7 years ago

Great contributions, everyone.

1) @MFernflower is that commercial compound real? i.e. can it be bought? can we get a quote? Is it available from anywhere else? It's an interesting catch, but significantly different from our S3 compounds, obviously, and would seem to lack the binding motifs that Vito predicted were necessary.

2) @mbhebhe - we'd need to be able to lithiate the full thienopyrimidine, but I guess you know we can do that with the current synthesis that involves a bromination. Do we have the epoxide?

3) There were some excellent Twitter inputs from Stuart Ralph who says i) There are Pfal sir2 knockouts available, and he or Chris Tonkin could test compounds, and ii) Christian Doerig and Gabriel Pradel have assays for Pfal CLKs - shall I reach out?

4) @spadavec your newer results suggest enough rationale for evaluation against PI3K and PI4K, do you think? I've not checked how we might do that yet.

MFernflower commented 7 years ago

@mattodd It's real!!!!!! http://www.sigmaaldrich.com/catalog/product/sigma/t5575?lang=en&region=US

mbhebhe commented 7 years ago

@mattodd sadly I have to lithiate again. We have the epoxide in the fridge somewhere and so do the stores.

MFernflower commented 7 years ago

@mbhebhe I'm a tiny bit lost - could you draw out the synthetic route to the compound Me and Chris Swain suggested?

spadavec commented 7 years ago

@mattodd

There were some excellent Twitter inputs from Stuart Ralph who says i) There are Pfal sir2 knockouts available, and he or Chris Tonkin could test compounds, and ii) Christian Doerig and Gabriel Pradel have assays for Pfal CLKs - shall I reach out?

Based on the data that we do have, I would suggest that if we do go this route, the Pfal CLKs (specifically CLK1 and CLK3) tests should be prioritized.

your newer results suggest enough rationale for evaluation against PI3K and PI4K, do you think? I've not checked how we might do that yet.

I'm currently looking into relevant assays here for Pfal PI3/4K, but my guess is that the S3 compounds which hit CLK (or any other kinase) are going to show (at least) nominal activity against PI4K (and I'm pretty confident that a fair number will), and that the real confirmation will likely be in the form of a knockdown-like experiment.

mbhebhe commented 7 years ago

@MFernflower

MFernflower commented 7 years ago

Two questions about that pathway:

Doesn't the free amine need to be boc-ed? Why not go for PCC in CH3CN as the oxidant?

@mbhebhe

holeung commented 7 years ago

I just found this thread. Great work, a lot of nice ideas to bounce around. Our lab does quite a bit of computational and crystallographic work with human protein kinases, but I'm not familiar with the malarial systems. I suggest looking through the literature to see what Pf kinases are considered promising drug targets. Most kinase inhibitors are promiscuous from a small to shocking degree. The active S3 compounds may hit Pf CLKs, but that might not be related to their mechanisms of action. As @spadavec suggested, it would be useful to do computing screens comparing predicted SARs against CLKs vs other Pf kinases. Also, be careful when using chemical probes like TG003 unless their specificity is thoroughly documented. It's probably hitting all kinds of other kinases along with who-knows-what.

@spadavec: Can you please describe how your target prediction algorithm works?

spadavec commented 7 years ago

@holeung The actual ligand/target prediction framework comes from a set of pre-constructed naive bayes classification models from ChEMBL (you can read about it here; essentially the models are based off all of the existing ligand/protein pairs in the chembl database with some caveats, and a given "test" ligand can be assigned probabilities of being a member of a protein/ligand class (e.g. being a binder) given fingerprint similarity. I did that for most of the S3 ligands, found the most represented targets, and translated those human hits to their plasmodium equivalents.

holeung commented 7 years ago

OSM-S-123 is OSM-S-106, the high activity molecule, but without the sulfonamide.

106and123

OSM-S-123 has been annotated as an inhibitor of human IKK beta (nuclear factor kappa B kinase beta subunit), EGFR (epidermal growth factor receptor), and VEGFR1 (vascular endothelial growth factor receptor) under ChEMBL entry 311108. Of these three kinases, IKK beta has the highest homology to a Pf kinase, CAMK/CDPK. These are heavily studied plasmodium kinases, essential for its life cycle, and considered good drug targets. https://www.nature.com/articles/ncomms1558 http://www.future-science.com/doi/abs/10.4155/fmc.12.183

Blast_IKKB_HUMAN.pdf

Two high homology Pf kinases, CDPK2 and CDPK4, have crystal structure available. http://www.rcsb.org/pdb/explore.do?structureId=4mvf http://www.rcsb.org/pdb/explore.do?structureId=4qox

I docked OSM-S-106 into both 4mvf and 4qox. This was done with smina with the vinardo scoring function and flexible side chains. Only 4qox gave a reasonable score, of 10.3. I would have expected a higher score given the potency of OSM-S-106. I haven't had the chance to look closely at SAR of Series 3 with these docked structures. The binding pockets are small and probably can't accommodate some of the larger Series 3 ligands that were found as active.

smina_dock1_4qox.zip

docked Crystal structure ligand in blue. OSM-S-106 in pink.

In conclusion, I am not convinced that the CDPKs are the true targets either. At this point, it is likely that OSM-S-106 acts as an ATP-mimic, perhaps at a kinase. The Pf kinases are promising targets. OSM may also want to consider a target-oriented strategy in the future. The CDPKs crystal structures came from the Structural Genomics Consortium in Toronto/Oxford. I imagine they would collaborate with OSM.

madgpap commented 7 years ago

@spadavec excellent work; I'm glad my models are put to good use. FYI, we followed a very similar approach for target convolution of mTB phenotypic hits - see here: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003253 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0121492

spadavec commented 7 years ago

Thank you so much for the ChEMBL models, @madgpap! I have some (hopefully) interesting ideas I want to share with you at some point regarding these models :)

It seems that @holeung beat me to the punch, but I'll throw my hat in the ring as well-- I looked through all of the data I could find, and found 14 Pf PDBs that were both (a) wildtype, (b) had a ATP/ADP/ANP/or ATP-like cocrystal, and if not Pfal, then (c) the structure must have at least 90% identity with the target structure. This resulted in 13 total kinases and 1 Bromodomain (all of which that have been implicated in pfal replication):

Target Name	Target PDB	Notes
Bromodomain ( Gcn5)	5TPX
NMNAT1	5LLT
Nicotinic acid mononucleotide adenylyltransferase	5LM3
PfPKA-R	5KBF
Phosphocholine cytidylyltransferase	4ZCP
CDPK4	4QOX
Thymidylate kinase	2YOG
Phosphoethanolamine Methyltransferase	4FGZ
Purine Nucleoside Phoryrlase	2BSX
PfPK5	1V0P
PfCLK3	3LLT
PfCDPK1	3Q5I	Plasmodium berghi, 91% identity
PfCDPK2	4MVF	Plasmodium falciparum K1, 99% identity
Spermidine Synthase	3RIE

After docking all of the S3 compounds which have 'IC50' data into these structures, the results are as follows:

PDB	Average Docking Score
3LLT	-8.40
2YOG	-8.27
5KBF	-8.20
4QOX	-7.98
1V0P	-7.98
2PMN	-7.84
4MVF	-7.79
2BSX	-7.68
5LM3	-7.68
3Q5I	-7.39
5TPX	-7.18
4ZCP	-7.12
4FGZ	-7.02
3RIE	-3.00

To me, these results say a whole lot of nothing, other than that 3RIE is very much not the structure that S3 is hitting.

chart 1

There seems to be a slight preference for 3LLT (CLK), 2YOG (Thymidylate kinase), and 5KBF (PfPKA), but by only a very slight margin. Across the board, the most potent compound (OSM-S-106) shows essentially no discrimination across the kinases

Target PDB	Docking Score
2YOG	-8.8
2PMN	-7.7
2BSX	-8.2
5KBF	-8.3
4FGZ	-8.3
4QOX	-8.3
3LLT	-7.9
4MVF	-7.5
3RIE	-7.4
5LM3	-8.1
5TPX	-7.5
4ZCP	-7.0
1V0P	-8.6
3Q5I	-7.1

One might be able to make an argument for 2YOG, but only un-enthusiastically, and with such moderate docking scores, one wouldn't expect the sub-micromolar EC50s that we see from OSM-S-106.

I'm going to try and expand this docking battery to include PI4K and a few others based off a weak homology, but I imagine this will be the core of the suite for now. Overall, I would be encouraged by the relative performance by 3LLT (CLK), further supporting the idea that this kinase is either a direct target, or a very close off-target.

MFernflower commented 7 years ago

@spadavec @mattodd Here are some modifications I thought up of the lead compound. Some may have already been shown while some are totally new: kinasemods

mbhebhe commented 7 years ago

WOW! Thanks for all the great work @spadavec and welcome @holeung So am I safe to say CLK1 and CLK3 are promising, Sir2a is a no go, Plk4 is a maybe, CDPKs are a maybe no?

@MFernflower yeah, I would have to protect the amine, thanks. PCC is also another oxidant we could use, I will see what we have available

holeung commented 7 years ago

@MFernflower: If the target is intracellular, like most kinases, you want your compound to be charge-neutral at physiological pH. Otherwise, it will be less membrane permeable.

MFernflower commented 7 years ago

that is true but my reasoning behind some of them was to mimic atp as close as I can

mattodd commented 7 years ago

Excellent new inputs, guys. A quick note: I'm going to loop in the SGC guys (Ray Hui, specifically) to scan the targets mentioned here and see if any could be looked at experimentally.

holeung commented 7 years ago

If we can experimentally confirm that the MOA targets are protein kinases, I can co-write a grant application on Series 3.

holeung commented 7 years ago

I redid the docking of OSM-S-106 vs all Pf kinase crystal structures, using smina, Vinardo score, and flexible side chains.

PDB Name Score 4qox CDPK4 -10.3 4mvf CDPK2 -8.7 3ltt CLK1 -11.8 5kbf PKA-R -13.0 2yog thymidylate kinase -11.4 2bsx purine nucleoside phosphorylase -9.4 1v0p PfPK5 -11.4 5lm3 nicotinic acid mononucleotide adenylyltransferase cannot dock 5tpx Bromodomain Gcn5 -9.7 5llt NMNAT1 cannot dock 3q5i CDPK1 -10.0

Top scores go to 5kbf (PKA-R), 3ltt (CLK1), 2yog (thymidylate kinase), 1v0p (PfPK5). I will upload models corresponding to these docked complexes shortly.

Our university has a small pilot grant opportunity for infectious disease research, with letter of intent due Jan. If we can narrow down our hypotheses, and come up with simple, testable experiments, I'll write the application. Collaborators welcome!

ELN

MFernflower commented 7 years ago

thymidylate kinase is a top target eh? I wonder if acyclovir kills plasmodium???? @mattodd @holeung

holeung commented 7 years ago

Docked complexes attached as zipped pdb files.

5kbf_docked.zip 3ltt_docked.zip 2yog_docked.zip 1v0p_docked.zip

anya-chen commented 6 years ago

@mattodd @holeung We have used our alignment-based target prediction model to identify the potential targets of OSM-S-106. Also we identified several kinases (PI3-kinase, CDK2 and CDK5) and carbonic anhydrase as potential targets. A further interesting candidate target is the M18 aspartyl aminopeptidase.

In the table below, for each individual target (column "target__name"), the ChEMBL IDs denote the most similar compound in the database. ROCS_TanimotoCombo is a similarity metric taking into account the molecular shape and chemical features similarity. The maximum value is 2.0.

drc007 commented 6 years ago

@anyachan This looks interesting, have you published details of the method?

anya-chen commented 6 years ago

@drc007 No, we have not published the method. We are planning to do this some time later.

drc007 commented 6 years ago

@anyachan Hi, Would it be worth considering overlay of maximum common substructure as the first step rather than centre of mass?

anya-chen commented 6 years ago

@drc007 Hi, yes, it would be definitely worthwhile looking into the possibility of adding a MCS search as an initial step to the algorithm.

mbhebhe commented 6 years ago

Hi @anyachan thanks for this. Sorry about the very delayed response, I was on holiday and now I'm back. Same as @drc007 's response, I was wondering if there was raw data relating to this, published somewhere . And how can we access it? ELN?

anya-chen commented 6 years ago

Hi @mbhebhe this is a method that we have been using internally in several projects already and had some nice successes with it. We haven't published the method yet but plan to do so within this year. Basically, what we do is we screen a curated subset of ChEMBL with ROCS. All ligands included in the ChEMBL subset are represented by conformational ensembles computed with OMEGA, whereas the compound of interest (query molecule) is represented by a single low-energy conformation also computed with OMEGA.

Now I put the raw data here.

Does this answer your questions?

mattodd commented 5 years ago

Quick housekeeping question @spadavec . I was just looking at all the above (because we're gathering momentum on writing up Series 3 MoA) and there's a link to a workbook up top but the workbook appears to be missing. Can you let me know if it's hidden, or removed? Trying to capture all the working/methods/inputs.

OpenSourceMalaria / OSM_To_Do_List