OpenSourceMalaria / Series4_PredictiveModel

Can we Predict Active Compounds in OSM Series 4?
7 stars 10 forks source link

Details of the AI3SD Proposal #2

Open edwintse opened 5 years ago

edwintse commented 5 years ago

The below text was used in an application submitted by @mattodd in Feb 2019 to the AI3SD network for funding. The title was "Predicting the Activity of Drug Candidates when there is No Target". The aim was to use an open approach to provide a real-world example of how new methods in AI/machine learning can actually impact drug discovery, and to do this by tackling a common and difficult problem: predicting actives in a phenotypic drug discovery project.

The Problem We aim to use diverse AI approaches to develop new ways to solve one of the biggest challenges in drug discovery: the prediction of activity of drug candidates in the absence of a biological target. We aim to do this using a public competition and open data.

In modern drug discovery it is frequently the case that optimisation of drug candidates is undertaken in the absence of a known biological target – so called phenotypic drug discovery.[Nat. Rev. Drug Disc. 2017, 16, 531] In many therapeutic areas such an approach is seen as superior since it focuses efforts on those compounds known to be effective vs. whole cells or organisms. A common situation is that we have structure-activity relationship data (i.e. a collection of molecules and their associated biological activities) but we do not have information about the binding interactions of those molecules with a biological target. Yet we must be predictive of which molecules to make next, in order to allocate resources wisely. To date, the vast majority of such prediction has been based on the intuition of the medicinal chemists involved. This highly valuable resource has limitations of bias, or of imagination, or in some cases of resources: many small-scale drug discovery projects (particularly in academia, or in start-up companies) may have few people examining the data, meaning good hypotheses may be missed, or key insights overlooked. Manual organic synthesis of individual compounds designed in response to hypotheses – during the so-called Lead Optimisation phase – is among the most expensive areas of drug discovery. It is not unusual for the synthesis of one molecule to require two weeks of a postdoctoral researcher’s time, equating to ca. £2K per compound. If we are to identify the medicines society most needs, we must become significantly more efficient at the prediction of phenotypic potency.

Drug discovery is a complicated, multi-faceted process involving a range of expertise that varies according to the stage of a project. The design of a compound intended to achieve a biological end involves disciplines across organic chemistry, medicinal chemistry, pharmacokinetics, computational chemistry and, usually, biology of the relevant organism (be that a pathogen, or human biology). There is a requirement of strategic planning through project management and a delicate balance of resources vs. potential gain - i.e. when to “kill” a project based on perceived likely return on investment. All of these roles are required in the specific drug discovery project at the heart of this research proposal. Open Source Malaria (OSM) involves scientists at UCL and The University of Sydney, but also scientists from elsewhere in the world such as those from the 20 other institutions that contributed to OSM’s first research paper. The preliminary attempts at solving the present research problem (described below under preliminary data) have come from the US, UK and Australia, from both the public and private sectors but also citizen scientists. This project involves people from the broadest range of professional backgrounds.

The project concerns predicting biological activity for Open Source Malaria Series 4, the most current pressing research problem for OSM. Over 200 molecules are known in this series, with potencies against the malaria parasite ranging from inactive to sub-10 nM. Yet it is still the case that weeks of laboratory-based effort may beexpended in making reasonable-looking molecules that are found to have zero potency. Series 4 is highly promising: several members have cured malaria in the mouse model of the disease. This isthe closest an open source series has ever been to the clinical phase of investigation.

The biological target is thought to be the ion pump PfATP4, an essential part of the parasite’s machinery in maintaining ion balance when inside a red blood cell. Despite extensive effort, the structure of this large, complex membrane-bound protein remains unsolved. The target is implicated by genetic changes found in resistant mutants. Understanding PfATP4 is crucial because it is the supposed target of the newest antimalarial to reach Phase III clinical trials, KAE609 (Cipargamin), developed by Novartis. Mysteriously, PfATP4 is also inhibited by a bewildering array of unrelated chemotypes.[Int. J. Parasitol. 2015, 5, 149] It is unclear how this is possible.

A competition was announced by OSM in 2016, run and concluded. All available data were curated for the community and submissions of models, using any methodology, were encouraged from OSM’s contributor network. Six diverse, fully-fledged entries were accompanied by full details. These models were evaluated against a test dataset (the MMV Pathogen Box) that had not been disclosed. Evaluation of the entries by a scientific advisory panel led to the award of the prize to two equally well-performing models. These models are not yet highly predictive, despite the quality of the input data and the relevant expertise of the entrants. This proposal now aims to build on this significant preliminary work. All original submitters are willing to improve the models and wish to publish the work, providing an excellent community starting point for this proposal.

Our aim is to become more predictive of potencyby building new models by whatever means possible. This research project will apply new AI methods, as part of an open competition, to generate a high quality predictive model for this important antimalarial series. There have been clear and exciting advances in recent years in applying computational approaches (e.g. matched pairs analysis) to the prediction of biological activity. However, the time is right for a broader exploration of approaches to compound prediction, using newer methods of machine learning that are being trialled by some of the leading companies in the field of AI who have joined this application. It is time that there were available to the scientific community clear examples of the potential impact of machine learning methods in real drug discovery projects, in order that we might more clearly understand the impact of such new technologies, and to clearly distinguish current state of the art from hype. To achieve this, and to discover the best ideas from any quarter, we propose to mix the application of new methods from the private sector with a public competition to which anyone may submit solutions. What has been missing to date from the OSM team, and what is missing in the vast majority of drug discovery projects around the world, is AI. Essentially no drug discovery project other than those taking place in the largest pharma companies, or those taking place at new AI-centric companies, involve any significant element of AI at all. This is an astonishing situation, one that we hope to reverse through the outcomes of this public-facing project.


Project Aims To develop a general AI-enabled approach to solving the prediction of biological activity in phenotypic drug discovery, through the use of a public competition. Objectives are as follows:


Project Method

mattodd commented 4 years ago

Added link to this Issue to the wiki, so can be closed when needed, but still useful for participants at the moment.