gitter-lab / pharmaco-image

MIT License
1 stars 0 forks source link

Merging Bioactivity Predictions from Cell Morphology and Chemical Fingerprint Models by Leveraging Similarity to Training Data #17

Open agitter opened 2 years ago

agitter commented 2 years ago

https://doi.org/10.1101/2022.08.11.503624

The applicability domain of machine learning models trained on structural fingerprints for the prediction of biological endpoints is often limited by the diversity of chemical space of the training data. In this work, we developed “similarity-based merger models” which combined the output of individual models trained on cell morphology (based on Cell Painting) and chemical structure (based on chemical fingerprints). Using a combination of a decision tree and logistic regression models on the structural versus morphological feature space of the training data, which leveraged the similarity of test compounds to training compounds, the similarity-based merger models used logistic equations to weigh individual model outputs. We applied these models to predict assay hit calls of 92 assays from ChEMBL and PubChem and 89 anonymised assays released by the Broad Institute, where the required Cell Painting annotations were available. We found that for the 181 assays used in this study the similarity-based merger model improved AUC in relative terms by 16.3% compared to the models using chemical structure alone (mean AUC of 0.75 vs. 0.64), and by 21.3% compared to the models using Cell Painting data alone (mean AUC of 0.62). Our results demonstrate that similarity-based merger models combining structure and cell morphology models can more accurately predict a wide range of biological assay outcomes and expand the applicability domain by better extrapolating to new structural and morphology spaces.