OpenSourceMalaria / Series4_PredictiveModel

Can we Predict Active Compounds in OSM Series 4?
7 stars 10 forks source link

New Series 4 candidates based on generative model - EOSI #34

Open miquelduranfrigola opened 3 years ago

miquelduranfrigola commented 3 years ago

Hello @mattodd @edwintse,

At @ersilia-os we have tried to generate new Series 4 candidates. In short, we provide two tables:

For a first assessment of the results, you can check this dynamic visualization of the selected 1k candidates. If a cluster is of particular interest, please refer to the full results to discover other similar molecules. You can also check a tree map of all molecules.

Our generative model approach is based on Reinvent 2.0. We have implemented several reinforcement-learning agents, aimed at optimizing activity and other desirable properties. This GitHub Repository contains more detailed information and source code.

This is the first time we run a generative model, so please bear with us. We will be more than happy to optimize further runs based on your feedback.

Thanks! @GemmaTuron @miquelduranfrigola

GemmaTuron commented 2 years ago

No good - Friday 11th at 1, 3 or 4 UK? Otherwise I fear we may have a looming Doodle Poll face_with_head_bandage

Let's go with Friday 11th at 13h UK time! What platform do you prefer?

GemmaTuron commented 2 years ago

Hi @mattodd !

Just confirming the meeting on friday at 13h UK time?

mattodd commented 2 years ago

Yes @GemmaTuron thanks for the reminder, just sent invite, but please forward to others if you like - we can meet at https://ucl.zoom.us/j/4808072370 then. Talk soon!

GemmaTuron commented 1 year ago

Hi all,

Short update on next steps: After a few team discussions, these are the next steps we'll take towards optimising the best compound (OSM-LO-72): We have contacted Dr Lehane and Prof. Kirk from the Australian National University, following their recently published work on a mutation in PfATP4 that confers resistance to cipargamin. PfATP4 is the suggested target of OSM Series 4, so we would like to know if we are able to bypass this resistance. Dr. Lehane has kindly offered to test the lead compound in sensitive vs resistant parasite strains. We will generate novel candidates with small changes on the right hand-side branch, trying to preserve binding to PfATP4 (again, thanks to the structure we generated with AlphaFold a while ago and the work from Qiu et al 2022) and improving its potency, solubility and, ideally, microsomal stability. @holeung we will also look at the homology model you shared #35 , if you have any new results that you want to share regarding this that would be fantastic!

Thanks everyone, we will post updates here as soon as we can.

GemmaTuron commented 1 year ago

And another short update as we start the work described above! In addition to the steps described above, Prof. Ben Corry and the PhD student John Tanner (Australian National University), who performed the molecular docking in Qiu et al, 2022 have kindly offered to dock the OSM Series 4 compounds and the newly generated candidates to PfATP4, to identify if they might be sensitive to the G358S mutation as well. To this end, @edwintse it would be really helpful if you have any information on 3D conformation or protonation states of the series. Will share results as soon as we have them!

edwintse commented 1 year ago

@GemmaTuron sounds great! We do have a few crystal structure files for a handful of compounds. I'll need to find them and share with you. As for protonation states, I guess a predictive software like MarvinSketch would do, otherwise I'm not entirely sure.

jhjensen2 commented 1 year ago

@GemmaTuron if you want something a little more low tech, but high throughput try protonator

John-D-Tanner commented 1 year ago

Hi Everyone! I have performed Molecular docking of the OSM4 compounds to the Colabfold structure of PfATP4.

Using autodock vina we docked the OSM4 compounds with known experimental IC50s and the new candidates to both wildtype and G358S isoforms of PfATP4. The search was constrained to the region surrounding G358. We unfortunately found no correlation between experimental IC50 and the docking score. There are a number of reasons why this might be the case, which we will continue to look into, but for the time being please interpret the following results with caution. A number of OSM4 compounds were found to bind in proximity to G358S loci (the box size is large enough to allow non-proximal binding). Of interest is OSM-LO-72, the new candidate with lowest predicted IC50, bound in proximity to G358S, though no change in affinity was predicted upon mutation (again, this is very preliminary and I have little confidence in the affinity prediction)

Moving forward from here I will binarise the IC50 values and rethink the correlation analysis as discussed with @GemmaTuron and @miquelduranfrigola. To this end, is there affinity data available for any of the series 4 compounds, rather than whole-cell IC50s? I will also do a comparison of the protein interactions of the predicted poses and cipargamin, for which we have more confidence in.

For more detailed description of the procedure and results, please see our github repository that contains the notebook and output files.

And thank you all for the opportunity to work on this project. I'm excited to see where this goes!

GemmaTuron commented 1 year ago

Thanks @John-D-Tanner !!

From the Ersilia side, we have been working on developing a refined generative tool combining different techniques. This is almost ready and we will apply it using the latest experimental datapoints available as starting points. The generated candidates will be filtered according to desired chemical and ADME properties as well as docking scores if possible.

mattodd commented 1 year ago

Hi @John-D-Tanner thanks for this, and sorry for the delay in getting back to you. Too many Github alerts! Also pinging @edwintse

Interesting results, adding a little more to the mystery of how these compounds are acting. We don't have affinity data, no. To the best of our knowledge, nobody has ever made PfATP4, so it's hard to do these kinds of experiments.

Thanks for posting raw data, but I think your link above is broken. Do you have a fresh one?

I guess a key experiment would be to try OSM Series 4 compounds in the resistant cell line, right?

John-D-Tanner commented 1 year ago

My apologies, the repository was set to private but should now be public and the link should work

edwintse commented 1 year ago

The last compound from the most recent set (EGT 611-1) was tested for activity and came back as inactive. There were also 2 early compounds from Evariste (@abrennan5) that we never tested that were also included in this batch. Both are inactive as well. The positive control (369) is as expected.

Untitled Wiley-4 Chemdraw Feb 2023.zip

GemmaTuron commented 1 year ago

Hi @edwintse !

Thanks for the latest update, and sorry about the silence, we've been working on the background preparing a generative package quick and easy to implement, ChemSampler (still under development, but basic functionalities completed) I am using this to generate new candidates, as well as having trained new activity prediction models with the updated data (I am missing the three Evariste compounds from above, but will incorporate those today!) Once I have the final list of modified candidates, we can filter by activity, metabolic stability and if we have more news on docking, by docking scores as well. I'll keep everyone updated. This is the open repo where I am working: https://github.com/ersilia-os/osm-series4-synthesis-round2 -- will also add documentation. Note that the predictive models are not yet available to everyone, but if you want me to run predictions, ping me here and I'll do so

GemmaTuron commented 1 year ago

Hi @edwintse and @mattodd

We have done a first iteration based off the 4 compounds in the previous round with activities of < 1 uM. we have used the following constrains:

With this constrains, we end up with the following 19 molecules: sampled_ersilia_nov22_selection

What do you think of these molecules You can find the full list of generated molecules with the associated predictions here and the sampled 19 molecules and associated predictions here We would like to hear back from you - should we test any of these molecules?

Thanks!

edwintse commented 1 year ago

We've shipped the following 5 compounds to Adele at ANU to have them tested in their PfATP resistant line. Results will be posted when received.

ANU resistance ANU resistance chemdraw.zip

GemmaTuron commented 1 year ago

Hi @mattodd @edwintse I wanted to share the new API to access models for online inference in case users are interested in trying them out! This is the one with models developed with OSM data: https://ersilia-app-t5zpw.ondigitalocean.app/?model_id=eos7yti

But we are also uploading other related models, such as the ones developed with data contributed by MMV https://ersilia-app-t5zpw.ondigitalocean.app/?model_id=eos4rta

we'll be announcing models through our social media links during this month of September. Let me know if this is useful or you have any questions!

mattodd commented 8 months ago

Though this issue is getting rather long, I wanted to add the current set of compounds being evaluated in this collaboration between OSM and Ersilia. Using the latest version of the model, and the latest experimental data, we are experimentally pursuing the below structures. We're using a combination of CRO (Piramal) and in-house synthesis, and we should be done by early April, when we'll ship the compounds for eval. Very exciting!

Final Set with Annotations v3 March 14 2024

Final Set with Annotations v3 March 14 2024.zip

@qxsml @edwintse @GemmaTuron @miquelduranfrigola

mattodd commented 6 months ago

Cross-referencing to the results for the above structures, which are at https://github.com/OpenSourceMalaria/Series4/issues/79

GemmaTuron commented 5 months ago

Hi all,

As an update from the Piramal synthesis, we successfully obtained Targets 1,5 and 9 . For target 11 and 12 we have faced many challenges and we have stopped the attempts to synthesise. We are attaching here all the routes Piramal has tried to obtain these two targets:

piramal_t11_t12.pptx