From what can be seen from the publication, various classification models were trained using a neural network called directed message passing neural network (D-MPNN), with diverse Datasets obtained from multiple sources to identify compounds that block hERG.
The Model with the optimal performance is D-MPNN + moe206 with an AUC-ROC value of 0.956 ± 0.005. But the Molecular descriptor (moe206) that was used for this is closed-source, so the model that was implemented on Ersilia was trained without a molecule Featurizer.
The original D-MPNN model with an AUC-ROC value of (AUC-ROC 0.947 ± 0.005) is what we are going to Reproduce, which was achieved on a random splitting using a 5-fold cross validation, it was trained on a dataset of 7889 compounds with well-defined experimental data on the hERG and with diverse chemical structures and has 6 thresholds (10 μM, 20 μM, 40 μM, 60 μM, 80 μM, and 100 μM) for distinguishing hERG blockers from non-blockers. The Author chooses a 10 μM threshold for the model.
This dataset was assembled by Cai et al in their work published in J Chem Inf Model, 2019. The Datasets is available here
Implement the model on your system as described by the authors
I cloned the model repository into my Ubuntu 22.4 system using git clone https://github.com/AI-amateur/DMPNN-hERG.git
The model requires Chemprop Installation, so I navigated to the path where Chemprop is.
cd dmpnn-herg/chemprop
Created the virtual environment and installed the packages and dependencies in the environment.yml file then proceed to activate the environment.
Note: The eos30f3 Model that was implemented on Ersilia is not the DPMN model that was trained with cai datasets of 7889 compounds, but one that was implemented from this checkpoints here.
The former model has only one prediction output, I have modified main.py to select 10 μM from the 6 output predictions for each threshold (10 μM, 20 μM, 40 μM, 60 μM, 80 μM, and 100 μM).
Identify Results you want to reproduce
From what can be seen from the publication, various classification models were trained using a neural network called directed message passing neural network (D-MPNN), with diverse Datasets obtained from multiple sources to identify compounds that block hERG.
The Model with the optimal performance is
D-MPNN + moe206
with an AUC-ROC value of 0.956 ± 0.005. But the Molecular descriptor (moe206) that was used for this is closed-source, so the model that was implemented on Ersilia was trained without a molecule Featurizer.The original D-MPNN model with an AUC-ROC value of (AUC-ROC 0.947 ± 0.005) is what we are going to Reproduce, which was achieved on a random splitting using a
5-fold
cross validation, it was trained on a dataset of7889
compounds with well-defined experimental data on the hERG and with diverse chemical structures and has 6 thresholds (10 μM, 20 μM, 40 μM, 60 μM, 80 μM, and 100 μM) for distinguishing hERG blockers from non-blockers. The Author chooses a 10 μM threshold for the model.This dataset was assembled by Cai et al in their work published in J Chem Inf Model, 2019. The Datasets is available here
Implement the model on your system as described by the authors
I cloned the model repository into my Ubuntu 22.4 system using
git clone https://github.com/AI-amateur/DMPNN-hERG.git
The model requires Chemprop Installation, so I navigated to the path where Chemprop is.
Created the virtual environment and installed the packages and dependencies in the
environment.yml
file then proceed to activate the environment.Check that the Model provides the same result as Eos30f3 on EMH
The AUC-ROC value of 0.947 ± 0.005 I’m trying to reproduce was gotten on the test sets, as seen here
To reproduce the AUC-ROC value since I don't have access to the test datasets.
I modified the reproduced shell script from here to train and test the model without the Featurizer :
And the AUC-ROC value of (AUC-ROC 0.947 ± 0.006) was reproduced successfully.
result.log
Note: The eos30f3 Model that was implemented on Ersilia is not the DPMN model that was trained with cai datasets of
7889
compounds, but one that was implemented from this checkpoints here.It was trained on
393
Compounds as seen hereTo further confirm this, I ran predictions with the test datasets provided, on the models in the checkpoint:
The same test datasets on the eos30f3 model on Ersilia Model Hub and obtained the same values across all classification metrics.
Originally posted by @Malikbadmus in https://github.com/ersilia-os/ersilia/issues/979#issuecomment-1996008631