COMPETITION ROUND 2: RESULTS!

edwintse commented 5 years ago

Round 2 (#1) is complete! Thank you again to all who participated.

We are pleased to announce the winner of the competition is Giovani Cincilla (@gcincilla; Molomics). A very close second place goes jointly to Willem van Hoorn (@wvanhoorn; Ex Scientia) and Ben Irwin/Mario Öeren/Tom Whitehead (@BenedictIrwin; Obtibrium and Intelligens). We’re also adding Davy Guan (@IamDavyG; USyd) to the list of winners as the best non-company entrant.

Congratulations! We will be awarding prizes to the best model(s) from a company as well as one from the wider community. £100 each will go to Giovanni and Davy for the company and non-company winners. £50 will also go to Willem and Ben/Mario/Tom (combined) as the runners-up. The prizes will be presented at the upcoming meeting (see below).

Huge thanks go to the other entrants (@mmgalushka, @jonjoncardoso, @holeung, @spadavec, @luiraym, @sladem-tox) – though you didn’t win, you’re still involved – see below!

What just happened?

The results from Round 1 may be found here. The submitted models for Round 2 were all here. These were evaluated vs. the actual potencies observed for these compounds, which are shown here and here. The analysis of the entries may be found here. This analysis has been conducted by Ed Tse (USyd & OSM), Murray Robertson (Strathclyde), Robert Glen (Cambridge) and Mat Todd (UCL & OSM).

Screen Shot 2019-11-04 at 10 05 32 am

Analysis of the models was done as follows:

All entries were collated into a table along with the experimental values
For easier comparison between entries, the values were normalised between 0 and 1
The predictions for each compound were plotted in separate column graphs to visually compare with the experimental results
The predictions were then classified as ‘correct’ or ‘incorrect’ (and further separated into false positives and false negatives)
Precision and recall were also calculated for each model as a further comparison
Winners were determined based on the highest precision calculated for each model from the “Raw (>25)” analysis

n.b. The initial analysis was done with relatively tight classifications limiting the results to >2.5 uM. The following analysis was relaxed by increasing the limit to >25 uM. Both can be found in the respective tabs at the bottom of the spreadsheet.

It’s quite interesting how similar some of the training structures were to the test structures. Murray has done an analysis of this here. This is to some extent expected, given that this is a lead optimisation project, but worth noticing in passing.

What’s next?

We are now moving on to the next phase of this project, which will involve:

1) The prediction of new active compounds to be synthesised here in the lab, i.e. the Validation Phase. We will be reaching out to the winners to use their models to predict 2 new compounds each. This will give us a total of 8 compounds to make (by yours truly) which will be tested for parasite killing before Christmas (fingers crossed). The idea is that these are predictions of molecules that are ideally structurally distinct from the known molecules (as much as possible), to try to identify a highly potent new active series. However, the ideal case would be for each of the 4 teams to predict one molecule that can be as similar as they like to a known, and one “moonshot” molecule that is as structurally distinct from the model as possible, while being predicted to be active according to the model.

2) Finishing off the paper describing this competition (here). All of the entrants from this round are asked to provide a brief summary of their methods (if possible) in a similar manner to what is currently in the paper for Round 1. Everyone please add your author details to the paper. This is a joint effort.

3) Running a one-day meeting on AI/ML in drug discovery, using this competition as the focus of the meeting. This will be in London in mid-January (e.g. week of Jan 20th), and details will come as soon as we’ve booked the space. Hopefully everyone can attend, and we have some money available to cover some travel expenses. More on this ASAP. If anyone would like to come but cannot make mid-January, please say.

So well done everyone, and we’re excited to synthesise the new predictions of novel actives!

jonjoncardoso commented 5 years ago

Congratulations to the winners! 🎉

This type of competition / online collaboration has been very fruitful and stimulating for me. I'm learning all the different ways one could model chemical data and I'm looking forward to see the results from the Validation Phase!

There is a chance I might be in London at the end of January so I might get to meet some of you in person :)

BenedictIrwin commented 5 years ago

Thanks everyone this looks great! The breakdown is interesting as some compounds were easy to predict by most models and others more varied. I'm glad you came up with a way to compare the classification and regression models.

I'm also glad we built the model with the 'small set' only for comparison. There is clearly a lot to be learned from the master chemical list when comparing the two models.

Is there a deadline/timescale for the generative predictions yet?

I know some methods cannot generate compounds however well they can predict them. I should be able to come up with something using machine learning methods rather than generating the compounds by hand.

Week of 20th Jan is looking free for the time being.

mmgalushka commented 5 years ago

Congratulations to the winners and everyone who took part in this competition! Very interesting results and not being chemist, it would be interesting to find out why some compounds were easier to predict than others.

It would be nice to meet all of you in London in mid-January. But it is difficult for me to confirm whether I would be there at this point.

sladem-tox commented 5 years ago

Thank you Edwin, a wonderful learning experience! I can see that taking a simple approach was a gamble that didn't pay off. Congratulations to the winners! Slade

Sent from my iPad

On 4 Nov 2019, at 9:11 PM, Edwin Tse notifications@github.com<mailto:notifications@github.com> wrote:

Round 2 (#1https://github.com/OpenSourceMalaria/Series4_PredictiveModel/issues/1) is complete! Thank you again to all who participated.

We are pleased to announce the winner of the competition is Giovani Cincilla (@gcincillahttps://github.com/gcincilla; Molomics). A very close second place goes jointly to Willem van Hoorn (@wvanhoornhttps://github.com/wvanhoorn; Ex Scientia) and Ben Irwin/Mario Öeren/Tom Whitehead (@BenedictIrwinhttps://github.com/BenedictIrwin; Obtibrium and Intelligens). We’re also adding Davy Guan (@IamDavyGhttps://github.com/IamDavyG; USyd) to the list of winners as the best non-company entrant.

Congratulations! We will be awarding prizes to the best model(s) from a company as well as one from the wider community. £100 each will go to Giovanni and Davy for the company and non-company winners. £50 will also go to Willem and Ben/Mario/Tom (combined) as the runners-up. The prizes will be presented at the upcoming meeting (see below).

Huge thanks go to the other entrants (@mmgalushkahttps://github.com/mmgalushka, @jonjoncardosohttps://github.com/jonjoncardoso, @holeunghttps://github.com/holeung, @spadavechttps://github.com/spadavec, @luiraymhttps://github.com/luiraym, @sladem-toxhttps://github.com/sladem-tox) – though you didn’t win, you’re still involved – see below!

What just happened?

The results from Round 1 may be found herehttps://docs.google.com/spreadsheets/d/1uaQ_mSVY6vQSbnDHD5gf232ySH-KVjUg--dQCsilGWI/edit?usp=sharing. The submitted models for Round 2 were all herehttps://github.com/OpenSourceMalaria/Series4_PredictiveModel/tree/master/Submitted%20Models. These were evaluated vs. the actual potencies observed for these compounds, which are shown herehttps://github.com/OpenSourceMalaria/Series4/issues/71 and herehttps://github.com/OpenSourceMalaria/Series4/issues/73. The analysis of the entries may be found herehttps://github.com/OpenSourceMalaria/Series4_PredictiveModel/blob/master/Paper%20SI%20files/Round%202%20Submission%20Analysis_V2.xlsx. This analysis has been conducted by Ed Tse (USyd & OSM), Murray Robertson (Strathclyde), Robert Glen (Cambridge) and Mat Todd (UCL & OSM).

[Screen Shot 2019-11-04 at 10 05 32 am]https://user-images.githubusercontent.com/18062981/68113029-b12f9a00-feea-11e9-90b1-afa2738afc1f.png

Analysis of the models was done as follows:

All entries were collated into a table along with the experimental values
For easier comparison between entries, the values were normalised between 0 and 1
The predictions for each compound were plotted in separate column graphs to visually compare with the experimental results
The predictions were then classified as ‘correct’ or ‘incorrect’ (and further separated into false positives and false negatives)
Precision and recall were also calculated for each model as a further comparison
Winners were determined based on the highest precision calculated for each model from the “Raw (>25)” analysis

n.b. The initial analysis was done with relatively tight classifications limiting the results to >2.5 uM. The following analysis was relaxed by increasing the limit to >25 uM. Both can be found in the respective tabs at the bottom of the spreadsheet.

It’s quite interesting how similar some of the training structures were to the test structures. Murray has done an analysis of this herehttps://github.com/OpenSourceMalaria/Series4_PredictiveModel/blob/master/Paper%20SI%20files/All_34_similarities.pdf. This is to some extent expected, given that this is a lead optimisation project, but worth noticing in passing.

What’s next?

We are now moving on to the next phase of this project, which will involve:

The prediction of new active compounds to be synthesised here in the lab, i.e. the Validation Phase. We will be reaching out to the winners to use their models to predict 2 new compounds each. This will give us a total of 8 compounds to make (by yours truly) which will be tested for parasite killing before Christmas (fingers crossed). The idea is that these are predictions of molecules that are ideally structurally distinct from the known molecules (as much as possible), to try to identify a highly potent new active series. However, the ideal case would be for each of the 4 teams to predict one molecule that can be as similar as they like to a known, and one “moonshot” molecule that is as structurally distinct from the model as possible, while being predicted to be active according to the model.
Finishing off the paper describing this competition (herehttps://docs.google.com/document/d/1aD29GjC8RjqrSDcWcEUptS04Z2v10deReRp0eB3kcp4/edit?usp=sharing). All of the entrants from this round are asked to provide a brief summary of their methods (if possible) in a similar manner to what is currently in the paper for Round 1. Everyone please add your author details to the paper. This is a joint effort.
Running a one-day meeting on AI/ML in drug discovery, using this competition as the focus of the meeting. This will be in London in mid-January (e.g. week of Jan 20th), and details will come as soon as we’ve booked the space. Hopefully everyone can attend, and we have some money available to cover some travel expenses. More on this ASAP. If anyone would like to come but cannot make mid-January, please say.

So well done everyone, and we’re excited to synthesise the new predictions of novel actives!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/OpenSourceMalaria/Series4_PredictiveModel/issues/18?email_source=notifications&email_token=AMYHQP6ODOWEINAT6ST6RZDQR7YOTA5CNFSM4JIRK6A2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWRUB6A, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMYHQP7ATNF2ECQT5BJJI3DQR7YOTANCNFSM4JIRK6AQ.

holeung commented 5 years ago

Great fun! It's important that we also follow up with a discussion and analysis of why some strategies worked better than others, why some molecules were easier/harder than others to predict, retrospective thoughts, etc. This might work better as a separate paper, focused on computational and technical details. Please let me know what you think! For industry participants, will it be a problem discussing your strategies?

BenedictIrwin commented 5 years ago

I can see a more detailed paper being useful. There only seems to be a small box to describe the model, which (makes sense for brevity). I have dropped potential references there to explain the method a bit more.

gcincilla commented 5 years ago

It seems we could reach useful predictive models, that’s great! Congratulation to everybody! I’ll be happy to donate the £100 price to an organization fighting against malaria.

To what deals with the next steps, I reply to the main points of this thread:

In respect to generating new & promising molecules on basis of Molomics predictive model, I would like to propose (to all the OSM project participants) an interactive and collaborative design approach (through Molomics Technology) where anybody can participate from anywhere at any time. Briefly, participants can use a web-based application and quickly design molecules in silico on basis of real-time predictions of our Plasmodium Falciparum inhibition model but also other predicted ADMET properties. Each participant will be able to see what any other member of the team is designing, getting inspiration and new design ideas for the in silico optimization of the molecules. This would allow to collaboratively explore and exploit the series-4 chemical space and select together the best molecules for synthesis. It would be great if we could involve synthetic and medicinal chemists in this activity, anyway, the help of anybody with his knowledge, skills and intuition will be important. To have a "first feeling" for this Molomics Technology, you can have a look and joint the public case study we are currently running. With that you can even test some features of the technology: http://molomics.com/explore
I’ll read and contribute to the paper soon. Did we establish deadlines in this respect?
It will be a pleasure for me to participate in the one-day meeting on AI/ML in drug discovery, using this competition as the focus of the meeting. The week of 20th January should be fine for me.
As proposed by @holeung, I also think that a different paper regarding the modelling strategies and retrospective analysis would be more appropriated than the current one.

Looking forward to your feedback!

holeung commented 5 years ago

@edwintse , can you please describe the "normalised" values in the spreadsheet in more detail?

spadavec commented 5 years ago

Also, and this is my fault for not clarifying in my model, but it appears the wrong prediction was used in my model--there is a "class" prediction that should have been used in the evaluation, not the raw ic50 predictions. I'm also having some confusion over the scaling method used--I was under the impression that the goal was to simply differentiate active (<1uM) from inactive (>1uM), and the range of outputs wouldn't be scaled/altered. If a simple 1uM cutoff is used for active/inactive, then you can assign classes to each of the predictions made from each model, and use the matthews correlation coefficient (MCC; https://en.wikipedia.org/wiki/Matthews_correlation_coefficient) as a balanced measure of classification accuracy. When you do this, it appears Giovanni did the best still, but the numbers change a bit:

                                  Model       MCC  correct_predictions  incorrect_predictions
0           NickGalushka_Auromind_Class  0.298556                   21                     13
1       WillemvanHoorn_Exscientia_Class  0.344265                   25                      9
2             JonCardosoSilva_KCL_Class  0.024246                   22                     12
3              BenIrwin_Optibrium_Class -0.003949                   16                     18
4       GiovanniCincilla_Molomics_Class  0.581678                   27                      7
5  VitoSpadavecchio_InterlinkedTX_Class  0.488217                   26                      8
6                   HoLeungNg_KSU_Class -0.104447                   24                     10
7                   DavyGuan_USYD_Class  0.189122                   24                     10
8                 RaymondLui_USYD_Class -0.277746                   19                     15
9              SladeMatthews_USYD_Class  0.000000                   25                      9

This was just my first pass at analyzing the data, so any clarification would be great!

BenedictIrwin commented 5 years ago

Hi @spadavec did you use the "Master List" set for our MCC value in that analysis? The small set was only for a model trained on the series 4 subset of the data. I'm surprised it looks so different to Willem's model when they were so close before.

I was under the impression it was a regression task to "predict" the activity, but I didn't read the extended discussion in too much depth, I saw some content on classification in there.

Our model also has the added benefit of producing an error bar/confidence region, so some predictions will be more confident than others and we could focus in on the best predictions. I believe this will be of great importance when generating new compounds as we can choose the most likely to succeed in trade off of purely a high activity. Being a deep neural network, it is easy to stick a generative element on top for the next round. It was not clear to me how the simpler models would generate any compounds on the onset of the project. Now I realise it is a process of generating compounds manually and predicting their properties using the model.

We also made predictions for all of the assays individually, which I believe would be useful in selecting for activity against, say, the drug resistant strain.

It will be hard to find any one metric which shows all the pro's and con's of each model. It does appear that a metric for classification models scores the classification models highly.

One could plot the MCC for all (regression at least) models as a function of the cutoff value. Could also use other metrics like Cohens Kappa?

Some regression metrics such as R^2 (coefficient of determination), r^2 (Pearson) and RMSE might be interesting for the further analysis on some of the models?

The proof in the pudding would be to use the model to generate a compound with suitable properties I guess.

edwintse commented 5 years ago

@holeung The normalised values were calculated (normalised value = (X-min)/(max-min) were X is the predicted value, min is 0 uM and max is 25 uM or 2.5 uM) to make it easier and quicker to compare each predicted value against the experimental. These normalised values are plotted for each compound in the separate tabs at the bottom of the spreadsheet file.

@spadavec Apologies for the confusion about the predictions from your model. Since we are more interested in the ability to accurately predict the potency values, your IC50 predictions were used for the analysis. The tally does improve when using your class predictions but I'm a bit confused about how these correlate to the IC50 predictions? Both class and value predictions were considered when doing the initial analysis (limiting things to <2.5 uM) but we considered the tolerance to be a bit tight so the final analysis was relaxed to <25 uM.

I hope that clarifies things a bit.

spadavec commented 5 years ago

@BenedictIrwin

Hi @spadavec did you use the "Master List" set for our MCC value in that analysis? The small set was only for a model trained on the series 4 subset of the data. I'm surprised it looks so different to Willem's model when they were so close before.

I don't believe I did--it was a very quick and slapdash analysis, and I may have grabbed the wrong column for your entry (I remember there being 2).

One could plot the MCC for all (regression at least) models as a function of the cutoff value. Could also use other metrics like Cohens Kappa?

I can easily do this--I'll do a work up later today with some more careful analyses (using different cutoffs, metrics, etc).

@edwintse

The normalised values were calculated (normalised value = (X-min)/(max-min) were X is the predicted value, min is 0 uM and max is 25 uM or 2.5 uM) to make it easier and quicker to compare each predicted value against the experimental.

I'm not sure I see the value in this, given that the predictions were directly comparable from experimental to predictions-- especially given that some predictions were classifiers, and others were linear regressions of IC50 directly. If we were comparing IC50 values directly, wouldn't we be more interested in a metric like MUE/MAE/RMSE on the unscaled predictions?

Since we are more interested in the ability to accurately predict the potency values, your IC50 predictions were used for the analysis. The tally does improve when using your class predictions but I'm a bit confused about how these correlate to the IC50 predictions?

Basically any ML model with moderate data to predict an IC50/EC50 value will be able to only get ~6x potency difference averaged between actual and predicted (e.g. +/- 0.7 pIC50)--even modern methods like FEP can only get ~1 kcal/mol accuracy at best. The measurements alone in wet lab experiments can swing as much as 5x, so given the inherent differences in potencies, I created a cutoff (I believe 300 nM) to be active, and everything else inactive. In retrospect, had I known there would be different cutoffs for activity, I would have definitely taken a different approach. I was primarily only interested in being able to distinguish active from inactive, and didn't care about the MUE/MAE of my pIC50 differences (which I now know matters quite a bit), especially for compounds far from the cutoff.

holeung commented 5 years ago

@spadavec: Yes, there should be multiple metrics of error. We can include this in the computational paper. Your analysis with different metrics will be very valuable.

mmgalushka commented 5 years ago

I can see some discussion about generating new compounds. I just want to make some comments regarding this topic, which someone may find useful. Since my knowledge in chemistry quite limited, I apologize in advance something if something that a say does not make sense.

First a little bit of history. A few years ago my company (Auromind) has been involved in the project for predicting properties of chemical compounds using SMILES. We were training DNN to predict LogD. The way how we approached this task was to create a variational autoencoder which can reconstruct SMILES. Then we isolated encoder and attached MLP instead of the decoder to learn either classification or regression task. In our case, it was to predict LogD (regression task). We trained autoencoder and then regressor using 1.7M SMLES from ChEMBL. The results were quite good, we wrote a paper draft but still struggled with time to finish it.

Autoencoder is quite an interesting DNN model. It uses information bottleneck to compress original SMILES (in our case to 1024 real values vector) and then uses this vector to reconstruct original SMILES. This 1024 real values vector often called a latent-vector. In some sense, it is a chemical compound fingerprint (alike Tanimoto).

The cool thing about latent vector, if we introduce a small random "epsilon" to the latent vector (for some target SMILES), in theory, autoencoder should generate very similar compounds to the original one (since similar SMILESs will be closely approximated in the latent space). We didn't try this functionality but I saw several publications describing this. So this is something I think would be interesting to explore.

One more interesting thing we observed during our experiments. If we relax encoder's weights, dring the model training, It will regroup compounds (latent vectors) according to the target property such as LogD. In my understanding, it tries to consider not only SMILES but also target properties during adjusting the latent space. In practice, it means, if know the compound X is "active" we can generate using random "epsilon" compounds Y1, Y2, Y3, ... which also are likely to be active. But this needs much more research to prove :)

We released this project on Github (https://github.com/mmgalushka/armchem) and also used the autoencoder to generate compounds fingerprint in this competition.

gcincilla commented 4 years ago

As mentioned above we launched an interactive and collaborative design approach (through Molomics Technology) where anybody can participate in the optimization of OSm Series-4 compounds from anywhere at any time.

Get involved!

Instructions are available in issue #24

OpenSourceMalaria / Series4_PredictiveModel

COMPETITION ROUND 2: RESULTS! #18