Review Model implementation

DhanshreeA commented 7 months ago

          Completed the reproducibility tasks - https://github.com/Adhivp/Ersilia_Tasks  @DhanshreeA

Took table S7 from the dataset of original paper https://doi.org/10.1021/acs.jcim.8b00769

Was unable to reproduce the value of probability in the paper
Was able to reproduce 22 molecules as hREG blockers ,while the paper identified 49 molecules as hREG blocker
Check the notebook for deatiled analysis

Originally posted by @Adhivp in https://github.com/ersilia-os/ersilia/issues/1025#issuecomment-2002472626

Malikbadmus commented 7 months ago

Describe the Bug

Users reported two (2) major issues with Model eos30gr

Null output predictions for certain Molecules.
Unable to Reproduce exact values obtained from the author's publications.

Steps taken to reproduce the behavior.

I ran predictions on a dataset of 196 molecules I_test.csv, I got a "Null" output for five (5) molecules eos30gr_output.csv

Causes

The reason for the Null output is here in the main.py file. The script below though not needed as it is redundant, applies a standardized procedure to the SMILES string from the input file, but it lacks exception handling for cases where molecules cannot be standardized thus leading to an StandardiseException: Multiple non-salt/solvate components error , leading to a null output predictions for those instances where the SMILES string cannot be standardized.

smiles = []
with open(input_file, "r") as f:
    reader = csv.reader(f)
    next(reader)
    for r in reader:
        smiles += [r[0]]

mols = []
for i, smi in enumerate(smiles):
    mol = Chem.MolFromSmiles(smi)
    mol = standardise.run(mol)
    if mol is not None:
        smi = Chem.MolToSmiles(mol)
        mol = Chem.MolFromSmiles(smi)
        mol.SetProp("MoleculeIdentifier", "id-{0}".format(i))

    mols += [mol]

I ran the standardized script on the input datasets I used, and the molecules that were not standardized corresponded with the "Null" output in our prediction.

Also, the Training Datasets that were used to train the Model in this script here, rows with null values in the "activity" column were filtered out, but there was no check for missing or duplicated SMILES string. Indeed, there were 840 duplicated SMILES strings present in the combined datasets used in training the model.

The Combined Datasets used to train the model were also converted in this form XLSX → SDF → CSV as shown here, since we are bypassing featurization of molecules with mol2vecI in training a new model, I believe direct conversion to CSV should have been used, the only featurization of molecules occurred at the LazyQsar stage.

Operating environment

Ubuntu 22.04.1 LTS

Additional context

From the Model implementation on Ersilia, it could be seen that the Mol2Vec featurization method that was used in the Publication model was absent in eos30gr, and both model also used different modeling techniques in training both models for bioactivity predictions. @DhanshreeA

Malikbadmus commented 7 months ago

PR created here.

Should i proceed with retraining the model?

Adhivp commented 7 months ago

If @DhanshreeA allows , Me and @Malikbadmus can together do the same.

GemmaTuron commented 4 months ago

Hi @DhanshreeA and @Malikbadmus

I've reviewed the model, and created the following datasets (Training, validation and test) Using a quick modelling they don't seem to work very well (AUROCS of aorund 0.7, 0.8 maximum, far from what the authors report) Nevertheless the AUROCS with ZairaChem are astoundingly good (0.99)

We cannot incorporate the full ZairaChem Model here, so what do we do?

Malikbadmus commented 4 months ago

Hi @DhanshreeA and @Malikbadmus

I've reviewed the model, and created the following datasets (Training, validation and test) Using a quick modelling they don't seem to work very well (AUROCS of aorund 0.7, 0.8 maximum, far from what the authors report) Nevertheless the AUROCS with ZairaChem are astoundingly good (0.99)

We cannot incorporate the full ZairaChem Model here, so what do we do?

Interestingly, using this new datasets, I am getting a AUROCS value of 0.958 which almost matches the publication score of 0.967, using the same multitask deep neural network as the author without the mol2vec featurization.

Link to notebook

Inyrkz commented 4 months ago

If the functionality is okay, then we will work on refactoring the code.

GemmaTuron commented 4 months ago

Ah that is useful @Malikbadmus

Can you refactor the model, incl. the code used to train the new model? Remove my tests and in the figures save the AUROCS in case anyone wants to check them. The datasets I got them from cleaning the original data, which was a bit convoluted, I agree

Malikbadmus commented 4 months ago

@GemmaTuron

With @Inyrkz assistance the model has been refactored, of the 49 approved antineoplastic drugs (including immunomodulating agents) identified by the author as hERG blockers. Was able to reproduce 44 of these drugs with the new model.

Notebook Link

GemmaTuron commented 4 months ago

Hi @Malikbadmus

Thanks, can you provide some more detail? Are these results using the datasets I created or you built them anew? Have you focused on the activity80 only or all the cut offs?

Malikbadmus commented 4 months ago

Hi @Malikbadmus

Thanks, can you provide some more detail? Are these results using the datasets I created or you built them anew? Have you focused on the activity80 only or all the cut offs?

Yes, the Model was trained with the datasets you provided, the approach used was multi-task learning(10um, 20um, 40um, 60um, 80um and 100um) as it turned out to have a better performance than using a single -task(80um) approach.

But when implementing on Ersilia the activity80(80um) cut off was selected from the multi-task predictions.

GemmaTuron commented 4 months ago

ok, should we then incorporate this version of the model? I am curious just to understand what happened from the last time we tried to reproduce the model and we could not get the good results (even myself)

Malikbadmus commented 4 months ago

ok, should we then incorporate this version of the model? I am curious just to understand what happened from the last time we tried to reproduce the model and we could not get the good results (even myself)

Yes, I believe so, since this new model could reproduce the authors result to a certain degree. I have created the PRhere.

On your second point, that might be due to the datasets used in training the model previously, there were missing and duplicated SMILES string in those datasets.

GemmaTuron commented 4 months ago

Ok, the model is now working Before closing this issue, @Malikbadmus can you update the readme file in the checkpoints folder to explain what is the final model we are providing? I am still unsure about providing the multitask as the authors did not use a multitask, just trained models on different cut offs - how much worse is the model trained in the single 80 uM cut-off? Can you show the AUROC values of the test and val sets in the README file?

Malikbadmus commented 4 months ago

Ok, the model is now working Before closing this issue, @Malikbadmus can you update the readme file in the checkpoints folder to explain what is the final model we are providing? I am still unsure about providing the multitask as the authors did not use a multitask, just trained models on different cut offs - how much worse is the model trained in the single 80 uM cut-off? Can you show the AUROC values of the test and val sets in the README file?

@GemmaTuron, I have updated the README file to reflect this.

A model trained on a single 80 uM decoy threshold has an AUROC value of 0.85, the notebook I used for this training is here.

Regarding the author not using a multi-task, further clarification on this will be appreciated, I might have misinterpreted the author's work, as it seems to be a Black-box model.

This is a summary of my current understanding of the deephERG approach:

The author utilizes a dataset composed of compounds where blockers were identified as having IC50 <= 10uM, and non-blockers as IC50 > 10uM, 20uM, 40uM, 60uM, 80uM, and 100uM to develop a deep learning approach.

The activity status which in this case represents the task compares the compound's activity status (as a blocker) across the different decoy threshold values, each learning task thus involves predicting whether a compound is a blocker or a non-blocker at a specific threshold.

The multi-task was trained to handle all six tasks at the same time and distinguish between blockers and non-blockers across all six thresholds simultaneously, while for the single task, an independent single-task neural network for each learning task was trained.

In the author training file, I can see two implementations:

The first one, which was termed multi-task seeks to distinguish between blockers and non-blockers across the different levels of activity(10uM, 20uM, 40uM, 60uM, 80uM, and 100uM).
The second implementation termed singe-task seeks to distinguish based on one activity alone (80uM).

For the deephERG models, the author set the concentration at 80 μM for a compound to inhibit a hERG channel.

GemmaTuron commented 4 months ago

Hi @Malikbadmus Thanks so much for detailed explanations, sorry my bad I think you are right! We can close this as completed thanks very much

ersilia-os / eos30gr