ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

🦠 Model Request: Human Liver Microsomal Stability #584

Closed paulinebanye closed 1 year ago

paulinebanye commented 1 year ago

Model Name

Human Liver Microsomal Stability

Model Description

Prediction of human liver microsomal stability is key for the screening of drugs in the early stage of drug discovery. The liver is the main organ for metabolizing drugs in humans and testing its metabolic stability is essential for the early detection of viable drug compounds.

Slug

hlm

Tag

hlm, liver, microsomal

Publication

https://pubmed.ncbi.nlm.nih.gov/17683964/

Source Code

https://github.com/ncats/ncats-adme/tree/master

License

MIT

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! πŸŽ‰

@pauline-banye ersilia model respository has been successfully created and is available at:

πŸ”— ersilia-os/eos8osp

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources πŸ“š

If you have any questions, please feel free to open an issue and get support from the community!

GemmaTuron commented 1 year ago

This model is presenting issues due to a corrupt .pt file -- We will close this for the moment and archive the repository

GemmaTuron commented 1 year ago

@masroor07 do you want to tackle this? I'll create a new repo for it

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! πŸŽ‰

@pauline-banye ersilia model respository has been successfully created and is available at:

πŸ”— ersilia-os/eos31ve

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources πŸ“š

If you have any questions, please feel free to open an issue and get support from the community!

masroor07 commented 1 year ago

@masroor07 do you want to tackle this? I'll create a new repo for it

Yes, would love to work on it! Thank you for the opportunity

GemmaTuron commented 1 year ago

Fork the above repo and let's see if we make it before the end of the contribution period!

masroor07 commented 1 year ago

Fork the above repo and let's see if we make it before the end of the contribution period!

I have forked the repository, cloned it to my local system and trying to explore the code.

masroor07 commented 1 year ago

Fork the above repo and let's see if we make it before the end of the contribution period!

Hello @GemmaTuron! Going through the HLM-NCATS model, it should be downloading the model file to the directory './models/hlm/gcnn_model.pt' but i doesn't seem to find it there. But in case of other models, for example: The PAMPA model file is added to the directory "./models/pampa/gcnn_model.pt".

Also, the HML model doesn't seem to be accepting input file.

Looking at the code, it seems to be importing the rlm_gcnn_model from the rlm as well. I am supposed to be adding the source code for the hlm model to the framework/<hlm>, and make the necessary changes? In that case, shouldn't I be adding the necessary files from the rlm as well?

masroor07 commented 1 year ago
def __init__(self, kekule_smiles: array = None, smiles: array = None):
        GcnnBase.__init__(self, kekule_smiles, column_dict_key='Predicted Class (Probability)', columns_dict_order = 1, smiles=smiles)

        # add RLM predictions as additional features along with SMILES
        rlm_predictions, rlm_labels = self.gcnn_predict(rlm_gcnn_model, rlm_gcnn_scaler)
        if rlm_predictions is not None:
            self.additional_features = rlm_predictions.tolist()
        else:
            print(f'No RLM Predictions')

        self._columns_dict['Prediction'] = {
            'order': 2,
            'description': 'class label',
            'isSmilesColumn': False
        }

        self.model_name = 'hlm'

This is the model class for the hlm model. Why do we need the rlm_prediction from the rlm model?

I have added the checkpoints for the model, copied the code to framework folder. I am trying to go through the code and make the necessary changes!

masroor07 commented 1 year ago
def __init__(self, kekule_smiles: array = None, smiles: array = None):
        GcnnBase.__init__(self, kekule_smiles, column_dict_key='Predicted Class (Probability)', columns_dict_order = 1, smiles=smiles)

        # add RLM predictions as additional features along with SMILES
        rlm_predictions, rlm_labels = self.gcnn_predict(rlm_gcnn_model, rlm_gcnn_scaler)
        if rlm_predictions is not None:
            self.additional_features = rlm_predictions.tolist()
        else:
            print(f'No RLM Predictions')

        self._columns_dict['Prediction'] = {
            'order': 2,
            'description': 'class label',
            'isSmilesColumn': False
        }

        self.model_name = 'hlm'

This is the model class for the hlm model. Why do we need the rlm_prediction from the rlm model?

I have added the checkpoints for the model, copied the code to framework folder. I am trying to go through the code and make the necessary changes!

@GemmaTuron

GemmaTuron commented 1 year ago

@masroor07

For the rlm_predictor thing... I am inclined to think that the authors simply did not change the name? because both mdoels have the same name in the checkpoints file (very dangerous!) it will be called anyway. for clarity, I'd change everything from rlm to hlm in our version of the code

masroor07 commented 1 year ago

@masroor07

For the rlm_predictor thing... I am inclined to think that the authors simply did not change the name? because both mdoels have the same name in the checkpoints file (very dangerous!) it will be called anyway. for clarity, I'd change everything from rlm to hlm in our version of the code

This is isn't the actual function that performs the predictions. In my case, i am inclined to think that this piece of code isn't being anywhere in making predictions and can be removed from the source code. The function that runs the predictions is:

def get_predictions(self) -> DataFrame:
        """
        Function that calculates consensus predictions

        Returns:
            Predictions (DataFrame): DataFrame with all predictions
        """

        if len(self.kekule_smiles) > 0:

            start = time.time()
            gcnn_predictions, gcnn_labels = self.gcnn_predict(hlm_gcnn_model, hlm_gcnn_scaler)
            end = time.time()
            print(f'HLM: {end - start} seconds to predict {len(self.predictions_df.index)} molecules')

            self.predictions_df['Prediction'] = pd.Series(
                pd.Series(np.where(gcnn_predictions>=0.5, 'unstable', 'stable'))
            )

            # if not intrprt_df.empty:
            #     intrprt_df['final_smiles'] = np.where(intrprt_df['rationale_score']>0, intrprt_df['smiles'].astype(str)+'_'+intrprt_df['rationale_smiles'].astype(str), intrprt_df['smiles'].astype(str))
            #     self.predictions_df['mol'] = pd.Series(intrprt_df['final_smiles'].tolist())

        return self.predictions_df

Should I try removing the block of code below?

        # add RLM predictions as additional features along with SMILES
        rlm_predictions, rlm_labels = self.gcnn_predict(rlm_gcnn_model, rlm_gcnn_scaler)
        if rlm_predictions is not None:
            self.additional_features = rlm_predictions.tolist()
        else:
            print(f'No RLM Predictions')
GemmaTuron commented 1 year ago

yeah, let's try. If it crashes we'll know why we need it for - maybe just comment it for the moment

masroor07 commented 1 year ago

Update:

I have added the necessary code to the framework folder. But the error that I get:

Loading Human Liver Microsomal Stability model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Human Liver Microsomal Stability model
Loading Human Liver Microsomal Stability model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Human Liver Microsomal Stability model
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cced0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a7f4c90>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccbd0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc150>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccdb0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc7b0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccd50>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3ccdb0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cc150>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cce10>
  0%|                                                                     | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    output_df = predict_df(smiles_list)
  File "main.py", line 39, in predict_df
    pred_df = predictor.get_predictions()
  File "/home/masroorshah/eos31ve/model/framework/code/../predictors/hlm/hlm_predictor.py", line 59, in get_predictions
    gcnn_predictions, gcnn_labels = self.gcnn_predict(hlm_gcnn_model, hlm_gcnn_scaler)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../base/gcnn.py", line 80, in gcnn_predict
    scaler=scaler
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/train/predict.py", line 34, in predict
    batch_preds = model(mol_batch, features_batch)
  File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/model.py", line 112, in forward
    output = self.ffn(self.encoder(*input))
  File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 177, in forward
    output = self.encoder.forward(batch, features_batch)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 75, in forward
    features_batch = torch.from_numpy(np.stack(features_batch)).float().to(self.device)
  File "<__array_function__ internals>", line 6, in stack
TypeError: dispatcher for __array_function__ did not return an iterable
GemmaTuron commented 1 year ago

Hi @masroor07 !

Did you use as template one of @pauline-banye repositories that predict other NCATS models?

masroor07 commented 1 year ago

Hi @masroor07 !

Did you use as template one of @pauline-banye repositories that predict other NCATS models?

Indeed, yes!

GemmaTuron commented 1 year ago

Ok,

I'm tagging @pauline-banye see if she can provide input here, Pauline, did you encounter this error? @masroor07 I can't look at it in depth now, sorry, please focus on your final application meanwhile and we'll tackle this in the coming days if possible

masroor07 commented 1 year ago

Ok,

I'm tagging @pauline-banye see if she can provide input here, Pauline, did you encounter this error? @masroor07 I can't look at it in depth now, sorry, please focus on your final application meanwhile and we'll tackle this in the coming days if possible

Alright, thank you! Will get started with the final applications as well and at the same time will try to solve this one as well.

masroor07 commented 1 year ago

Update:

I have added the necessary code to the framework folder. But the error that I get:

Loading Human Liver Microsomal Stability model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Human Liver Microsomal Stability model
Loading Human Liver Microsomal Stability model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Human Liver Microsomal Stability model
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cced0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a7f4c90>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccbd0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc150>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccdb0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc7b0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccd50>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3ccdb0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cc150>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cce10>
  0%|                                                                     | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    output_df = predict_df(smiles_list)
  File "main.py", line 39, in predict_df
    pred_df = predictor.get_predictions()
  File "/home/masroorshah/eos31ve/model/framework/code/../predictors/hlm/hlm_predictor.py", line 59, in get_predictions
    gcnn_predictions, gcnn_labels = self.gcnn_predict(hlm_gcnn_model, hlm_gcnn_scaler)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../base/gcnn.py", line 80, in gcnn_predict
    scaler=scaler
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/train/predict.py", line 34, in predict
    batch_preds = model(mol_batch, features_batch)
  File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/model.py", line 112, in forward
    output = self.ffn(self.encoder(*input))
  File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 177, in forward
    output = self.encoder.forward(batch, features_batch)
  File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 75, in forward
    features_batch = torch.from_numpy(np.stack(features_batch)).float().to(self.device)
  File "<__array_function__ internals>", line 6, in stack
TypeError: dispatcher for __array_function__ did not return an iterable

I have tried going through other @pauline-banye's repositories as well. I don't seem to find any sort of difference there. I tried cloning the repositoies and ran them locally. Do i need to make any changes to chemprop to solve this issue?

GemmaTuron commented 1 year ago

Hi @masroor07

What about the code you commented out with the rlm predictor code? This might be giving an issue Also, I'd need more than just this error, could you print the output of the function that is crashing? (print it in the screen for example?)

paulinebanye commented 1 year ago

Hi @GemmaTuron @masroor07 Kudos on the progress you made with this repo πŸ˜€.

I encountered issues with this model as well. It is dependent on the RLM piece of code and it returned errors when I removed it.

I sent an email regarding this to the authors but never got a response.

paulinebanye commented 1 year ago

@masroor07 I do understand you're finalizing your final applications but if you are done with it, we could try to tackle the HLM model at your convenience, if you are willing.

paulinebanye commented 1 year ago

Ok,

I'm tagging @pauline-banye see if she can provide input here, Pauline, did you encounter this error? @masroor07 I can't look at it in depth now, sorry, please focus on your final application meanwhile and we'll tackle this in the coming days if possible

Yes I encountered errors with this model. I'm going through the ereors @masroor07 reported

masroor07 commented 1 year ago

Hi @masroor07

What about the code you commented out with the rlm predictor code? This might be giving an issue Also, I'd need more than just this error, could you print the output of the function that is crashing? (print it in the screen for example?)

Yes, It uses RLM model as well. I will try addidng the rlm predictor’s code to the code and see what I can get out of it!

masroor07 commented 1 year ago

Hi @masroor07

What about the code you commented out with the rlm predictor code? This might be giving an issue Also, I'd need more than just this error, could you print the output of the function that is crashing? (print it in the screen for example?)

the model uses chemprop to run predictions. And in other NCATS model’s, it works fine! i will try to debug the other models and try to compare both.

masroor07 commented 1 year ago

@masroor07 I do understand you're finalizing your final applications but if you are done with it, we could try to tackle the HLM model at your convenience, if you are willing.

yeah sure, would love to assist! I will try to finish writing my final application by tomorrow! We can get started whenever up for it!

masroor07 commented 1 year ago

Hello @GemmaTuron, I have finally been able to run the model! Yes, @pauline-banye, depends on the rlm piece of code. I added that piece of code back, configured the checkpoints for the rlm model and copied the rlm model's source code to the repository. I am finally able to run predictions. Here is the csv output file: output.csv

masroor07 commented 1 year ago

Hello @GemmaTuron, I have finally been able to run the model! Yes, @pauline-banye, depends on the rlm piece of code. I added that piece of code back, configured the checkpoints for the rlm model and copied the rlm model's source code to the repository. I am finally able to run predictions. Here is the csv output file: output.csv

What are the next steps?

GemmaTuron commented 1 year ago

Hi @masroor07

If you added the rlm model, the predictions now are from the rlm model not the hlm?

masroor07 commented 1 year ago

Hi @masroor07

If you added the rlm model, the predictions now are from the rlm model not the hlm?

No, the HLM model runs on top of RLM model. It doesn't produce the same output as that of the RLM. I tried running both the models against the same SMILE inputs. Here are the results produced:

rlm: ADME_Predictions_2023-04-03-205445.csv

hlm: output.csv

Conclusion: The model seems to be working fine.

GemmaTuron commented 1 year ago

Oh that's interesting, thanks @masroor07 ! Can you confirm you get the same result with the HLM model run from the NCATS code directly? And then please create PR for the model to be incorporated! Thanks

masroor07 commented 1 year ago

Oh that's interesting, thanks @masroor07 ! Can you confirm you get the same result with the HLM model run from the NCATS code directly? And then please create PR for the model to be incorporated! Thanks

Alright! thank you

masroor07 commented 1 year ago

Oh that's interesting, thanks @masroor07 ! Can you confirm you get the same result with the HLM model run from the NCATS code directly? And then please create PR for the model to be incorporated! Thanks

Hi @GemmaTuron I ran the HLM model from the NCATS code directly and can confirm, the results were the same! Thank you

Output:

ADME_Predictions_2023-04-04-055432.csv

masroor07 commented 1 year ago

Hello @GemmaTuron , Created a PR as well! Could you please review? Thank you

GemmaTuron commented 1 year ago

Hi @masroor07 ! I won't be able to merge it until our Git LFS quota has been resetted, apologies! It will be next week ;)

masroor07 commented 1 year ago

Hi @masroor07 ! I won't be able to merge it until our Git LFS quota has been resetted, apologies! It will be next week ;)

alright! thank you :)

GemmaTuron commented 1 year ago

Hi @masroor07

I've solved the issue, so I tried to merge your PR but several errors in the metadata file, the fields are pre coded and they should be filled in according to the detailed instructions on our documentation. I've modified it to speed this up But meanwhile, is the model converting the results to probability of 1 or you are simply outputing the 0 and 1 ? We prefer to give always Proba of 1

masroor07 commented 1 year ago

Hi @masroor07

I've solved the issue, so I tried to merge your PR but several errors in the metadata file, the fields are pre coded and they should be filled in according to the detailed instructions on our documentation. I've modified it to speed this up But meanwhile, is the model converting the results to probability of 1 or you are simply outputing the 0 and 1 ? We prefer to give always Proba of 1

The model provides predicted class (1 or 0) for a given compound. If the predicted class is '1', it means the compound is predicted as unstable (t1/2 <= 30 min) and if the predicted class is '0', the compound is predicted as stable (t1/2 > 30 min). The model also provides a probability score (between 0 and 1), shown in parentheses next to the predicted class.

masroor07 commented 1 year ago

Hi @masroor07

I've solved the issue, so I tried to merge your PR but several errors in the metadata file, the fields are pre coded and they should be filled in according to the detailed instructions on our documentation. I've modified it to speed this up But meanwhile, is the model converting the results to probability of 1 or you are simply outputing the 0 and 1 ? We prefer to give always Proba of 1

i can try to go through the metadata file and modify it in accordance to the documentation?

GemmaTuron commented 1 year ago

Hi @masroor07

It's fine I've already edited it with some changes in the description and the links. I've changed the interpretation from what was there: "It is a model which provides predicts class (1 or 0) for a given compound. If the predicted class is '1', it means the compound is predicted as unstable (t1/2 <= 30 min) and if the predicted class is '0', the compound is predicted as stable (t1/2 > 30 min). The model also provides a probability score (between 0 and 1), shown in parentheses next to the predicted class.", To what we want -- did you modify pauline's code or the output is still proba of 1?

GemmaTuron commented 1 year ago

Hi @masroor07

Actually having a closer look at the model, there are a few suggestions before merging it:

masroor07 commented 1 year ago

Hi @masroor07

Actually having a closer look at the model, there are a few suggestions before merging it:

  • Metadata: please change the output type to the accepted inputs, in this case, it should be "Probability" if the model is giving the probability as output (which it should)
  • I've noticed you put the model on another folder (with a small typo, hml instead of hlm). Please add all the checkpoints inside the model folder, not outside it If you can do this changes I'll test again the PR

I did change the output type to probability. I must have copied some directory hml to the project folder, my bad! removed it

Thank you!

GemmaTuron commented 1 year ago

Hi @masroor07 Thanks, Can we confirm the interpretation of the model output is well written? Are we giving the probability of 1 only? Which means, stable or unstable? Please double check and confirm what is the right interpretation. The automated test is failing at model predict time, it does not find the pandas module, so probably an issue with conda env? https://github.com/ersilia-os/eos31ve/actions/runs/4615351353/jobs/8159147612?pr=1

masroor07 commented 1 year ago

Hi @masroor07 Thanks, Can we confirm the interpretation of the model output is well written? Are we giving the probability of 1 only? Which means, stable or unstable? Please double check and confirm what is the right interpretation. The automated test is failing at model predict time, it does not find the pandas module, so probably an issue with conda env? https://github.com/ersilia-os/eos31ve/actions/runs/4615351353/jobs/8159147612?pr=1

Addressed the environment issue for the model. I will try to review the code once again and confirm the interpretation of the model tomorrow, before making a commit. Thank you for the help

masroor07 commented 1 year ago

Made the changes to the environment.yml and made some changes to the interpretation of the model.

Committed the changes!

GemmaTuron commented 1 year ago

Hi @masroor07

I've been working on the model:

I cannot push via your repo so I have created my own fork to push to the ersilia repository, but thanks for all the work! You can now delete the ersilia forked repos from your user :)