Closed paulinebanye closed 1 year ago
/approve
@pauline-banye ersilia model respository has been successfully created and is available at:
π ersilia-os/eos8osp
Now that your new model respository has been created, you are ready to start contributing to it!
Here are some brief starter steps for contributing to your new model repository:
Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository
README.md
file to accurately describe your modelIf you have any questions, please feel free to open an issue and get support from the community!
This model is presenting issues due to a corrupt .pt file -- We will close this for the moment and archive the repository
@masroor07 do you want to tackle this? I'll create a new repo for it
/approve
@pauline-banye ersilia model respository has been successfully created and is available at:
π ersilia-os/eos31ve
Now that your new model respository has been created, you are ready to start contributing to it!
Here are some brief starter steps for contributing to your new model repository:
Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository
README.md
file to accurately describe your modelIf you have any questions, please feel free to open an issue and get support from the community!
@masroor07 do you want to tackle this? I'll create a new repo for it
Yes, would love to work on it! Thank you for the opportunity
Fork the above repo and let's see if we make it before the end of the contribution period!
Fork the above repo and let's see if we make it before the end of the contribution period!
I have forked the repository, cloned it to my local system and trying to explore the code.
Fork the above repo and let's see if we make it before the end of the contribution period!
Hello @GemmaTuron! Going through the HLM-NCATS model, it should be downloading the model file to the directory './models/hlm/gcnn_model.pt'
but i doesn't seem to find it there. But in case of other models, for example: The PAMPA model file is added to the directory "./models/pampa/gcnn_model.pt".
Also, the HML model doesn't seem to be accepting input file.
[2023-03-28 06:10:11,993] ERROR in app: Error making a prediction.
[2023-03-28 06:10:11,993] ERROR in app: error type: <class 'NameError'>
[2023-03-28 06:10:11,993] ERROR in app: name 'HLMPredictior' is not defined
UPDATE I was able to finally run the model. Thank you!
Looking at the code, it seems to be importing the rlm_gcnn_model from the rlm as well. I am supposed to be adding the source code for the hlm model to the framework/<hlm>
, and make the necessary changes? In that case, shouldn't I be adding the necessary files from the rlm as well?
def __init__(self, kekule_smiles: array = None, smiles: array = None):
GcnnBase.__init__(self, kekule_smiles, column_dict_key='Predicted Class (Probability)', columns_dict_order = 1, smiles=smiles)
# add RLM predictions as additional features along with SMILES
rlm_predictions, rlm_labels = self.gcnn_predict(rlm_gcnn_model, rlm_gcnn_scaler)
if rlm_predictions is not None:
self.additional_features = rlm_predictions.tolist()
else:
print(f'No RLM Predictions')
self._columns_dict['Prediction'] = {
'order': 2,
'description': 'class label',
'isSmilesColumn': False
}
self.model_name = 'hlm'
This is the model class for the hlm
model. Why do we need the rlm_prediction
from the rlm model?
I have added the checkpoints for the model, copied the code to framework folder. I am trying to go through the code and make the necessary changes!
def __init__(self, kekule_smiles: array = None, smiles: array = None): GcnnBase.__init__(self, kekule_smiles, column_dict_key='Predicted Class (Probability)', columns_dict_order = 1, smiles=smiles) # add RLM predictions as additional features along with SMILES rlm_predictions, rlm_labels = self.gcnn_predict(rlm_gcnn_model, rlm_gcnn_scaler) if rlm_predictions is not None: self.additional_features = rlm_predictions.tolist() else: print(f'No RLM Predictions') self._columns_dict['Prediction'] = { 'order': 2, 'description': 'class label', 'isSmilesColumn': False } self.model_name = 'hlm'
This is the model class for the
hlm
model. Why do we need therlm_prediction
from the rlm model?I have added the checkpoints for the model, copied the code to framework folder. I am trying to go through the code and make the necessary changes!
@GemmaTuron
@masroor07
For the rlm_predictor thing... I am inclined to think that the authors simply did not change the name? because both mdoels have the same name in the checkpoints file (very dangerous!) it will be called anyway. for clarity, I'd change everything from rlm to hlm in our version of the code
@masroor07
For the rlm_predictor thing... I am inclined to think that the authors simply did not change the name? because both mdoels have the same name in the checkpoints file (very dangerous!) it will be called anyway. for clarity, I'd change everything from rlm to hlm in our version of the code
This is isn't the actual function that performs the predictions. In my case, i am inclined to think that this piece of code isn't being anywhere in making predictions and can be removed from the source code. The function that runs the predictions is:
def get_predictions(self) -> DataFrame:
"""
Function that calculates consensus predictions
Returns:
Predictions (DataFrame): DataFrame with all predictions
"""
if len(self.kekule_smiles) > 0:
start = time.time()
gcnn_predictions, gcnn_labels = self.gcnn_predict(hlm_gcnn_model, hlm_gcnn_scaler)
end = time.time()
print(f'HLM: {end - start} seconds to predict {len(self.predictions_df.index)} molecules')
self.predictions_df['Prediction'] = pd.Series(
pd.Series(np.where(gcnn_predictions>=0.5, 'unstable', 'stable'))
)
# if not intrprt_df.empty:
# intrprt_df['final_smiles'] = np.where(intrprt_df['rationale_score']>0, intrprt_df['smiles'].astype(str)+'_'+intrprt_df['rationale_smiles'].astype(str), intrprt_df['smiles'].astype(str))
# self.predictions_df['mol'] = pd.Series(intrprt_df['final_smiles'].tolist())
return self.predictions_df
Should I try removing the block of code below?
# add RLM predictions as additional features along with SMILES
rlm_predictions, rlm_labels = self.gcnn_predict(rlm_gcnn_model, rlm_gcnn_scaler)
if rlm_predictions is not None:
self.additional_features = rlm_predictions.tolist()
else:
print(f'No RLM Predictions')
yeah, let's try. If it crashes we'll know why we need it for - maybe just comment it for the moment
Update:
I have added the necessary code to the framework folder. But the error that I get:
Loading Human Liver Microsomal Stability model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Human Liver Microsomal Stability model
Loading Human Liver Microsomal Stability model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Human Liver Microsomal Stability model
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cced0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a7f4c90>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccbd0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc150>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccdb0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc7b0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccd50>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3ccdb0>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cc150>
<rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cce10>
0%| | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 68, in <module>
output_df = predict_df(smiles_list)
File "main.py", line 39, in predict_df
pred_df = predictor.get_predictions()
File "/home/masroorshah/eos31ve/model/framework/code/../predictors/hlm/hlm_predictor.py", line 59, in get_predictions
gcnn_predictions, gcnn_labels = self.gcnn_predict(hlm_gcnn_model, hlm_gcnn_scaler)
File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../base/gcnn.py", line 80, in gcnn_predict
scaler=scaler
File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/train/predict.py", line 34, in predict
batch_preds = model(mol_batch, features_batch)
File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/model.py", line 112, in forward
output = self.ffn(self.encoder(*input))
File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 177, in forward
output = self.encoder.forward(batch, features_batch)
File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 75, in forward
features_batch = torch.from_numpy(np.stack(features_batch)).float().to(self.device)
File "<__array_function__ internals>", line 6, in stack
TypeError: dispatcher for __array_function__ did not return an iterable
Hi @masroor07 !
Did you use as template one of @pauline-banye repositories that predict other NCATS models?
Hi @masroor07 !
Did you use as template one of @pauline-banye repositories that predict other NCATS models?
Indeed, yes!
Ok,
I'm tagging @pauline-banye see if she can provide input here, Pauline, did you encounter this error? @masroor07 I can't look at it in depth now, sorry, please focus on your final application meanwhile and we'll tackle this in the coming days if possible
Ok,
I'm tagging @pauline-banye see if she can provide input here, Pauline, did you encounter this error? @masroor07 I can't look at it in depth now, sorry, please focus on your final application meanwhile and we'll tackle this in the coming days if possible
Alright, thank you! Will get started with the final applications as well and at the same time will try to solve this one as well.
Update:
I have added the necessary code to the framework folder. But the error that I get:
Loading Human Liver Microsomal Stability model Loading pretrained parameter "encoder.encoder.cached_zero_vector". Loading pretrained parameter "encoder.encoder.W_i.weight". Loading pretrained parameter "encoder.encoder.W_h.weight". Loading pretrained parameter "encoder.encoder.W_o.weight". Loading pretrained parameter "encoder.encoder.W_o.bias". Loading pretrained parameter "ffn.1.weight". Loading pretrained parameter "ffn.1.bias". Loading pretrained parameter "ffn.4.weight". Loading pretrained parameter "ffn.4.bias". Finished loading Human Liver Microsomal Stability model Loading Human Liver Microsomal Stability model Loading pretrained parameter "encoder.encoder.cached_zero_vector". Loading pretrained parameter "encoder.encoder.W_i.weight". Loading pretrained parameter "encoder.encoder.W_h.weight". Loading pretrained parameter "encoder.encoder.W_o.weight". Loading pretrained parameter "encoder.encoder.W_o.bias". Loading pretrained parameter "ffn.1.weight". Loading pretrained parameter "ffn.1.bias". Loading pretrained parameter "ffn.4.weight". Loading pretrained parameter "ffn.4.bias". Finished loading Human Liver Microsomal Stability model <rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cced0> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a7f4c90> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccbd0> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc150> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccdb0> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a6cc7b0> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a6ccd50> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a3ccdb0> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cc150> <rdkit.Chem.rdchem.Mol object at 0x7f3a6a3cce10> 0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "main.py", line 68, in <module> output_df = predict_df(smiles_list) File "main.py", line 39, in predict_df pred_df = predictor.get_predictions() File "/home/masroorshah/eos31ve/model/framework/code/../predictors/hlm/hlm_predictor.py", line 59, in get_predictions gcnn_predictions, gcnn_labels = self.gcnn_predict(hlm_gcnn_model, hlm_gcnn_scaler) File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../base/gcnn.py", line 80, in gcnn_predict scaler=scaler File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/train/predict.py", line 34, in predict batch_preds = model(mol_batch, features_batch) File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/model.py", line 112, in forward output = self.ffn(self.encoder(*input)) File "/home/masroorshah/miniconda3/envs/hml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 177, in forward output = self.encoder.forward(batch, features_batch) File "/home/masroorshah/eos31ve/model/framework/predictors/hlm/../chemprop/chemprop/models/mpn.py", line 75, in forward features_batch = torch.from_numpy(np.stack(features_batch)).float().to(self.device) File "<__array_function__ internals>", line 6, in stack TypeError: dispatcher for __array_function__ did not return an iterable
I have tried going through other @pauline-banye's repositories as well. I don't seem to find any sort of difference there. I tried cloning the repositoies and ran them locally. Do i need to make any changes to chemprop to solve this issue?
Hi @masroor07
What about the code you commented out with the rlm predictor code? This might be giving an issue Also, I'd need more than just this error, could you print the output of the function that is crashing? (print it in the screen for example?)
Hi @GemmaTuron @masroor07 Kudos on the progress you made with this repo π.
I encountered issues with this model as well. It is dependent on the RLM piece of code and it returned errors when I removed it.
I sent an email regarding this to the authors but never got a response.
@masroor07 I do understand you're finalizing your final applications but if you are done with it, we could try to tackle the HLM model at your convenience, if you are willing.
Ok,
I'm tagging @pauline-banye see if she can provide input here, Pauline, did you encounter this error? @masroor07 I can't look at it in depth now, sorry, please focus on your final application meanwhile and we'll tackle this in the coming days if possible
Yes I encountered errors with this model. I'm going through the ereors @masroor07 reported
Hi @masroor07
What about the code you commented out with the rlm predictor code? This might be giving an issue Also, I'd need more than just this error, could you print the output of the function that is crashing? (print it in the screen for example?)
Yes, It uses RLM model as well. I will try addidng the rlm predictorβs code to the code and see what I can get out of it!
Hi @masroor07
What about the code you commented out with the rlm predictor code? This might be giving an issue Also, I'd need more than just this error, could you print the output of the function that is crashing? (print it in the screen for example?)
the model uses chemprop to run predictions. And in other NCATS modelβs, it works fine! i will try to debug the other models and try to compare both.
@masroor07 I do understand you're finalizing your final applications but if you are done with it, we could try to tackle the HLM model at your convenience, if you are willing.
yeah sure, would love to assist! I will try to finish writing my final application by tomorrow! We can get started whenever up for it!
Hello @GemmaTuron, I have finally been able to run the model! Yes, @pauline-banye, depends on the rlm piece of code. I added that piece of code back, configured the checkpoints for the rlm model and copied the rlm model's source code to the repository. I am finally able to run predictions. Here is the csv output file: output.csv
Hello @GemmaTuron, I have finally been able to run the model! Yes, @pauline-banye, depends on the rlm piece of code. I added that piece of code back, configured the checkpoints for the rlm model and copied the rlm model's source code to the repository. I am finally able to run predictions. Here is the csv output file: output.csv
What are the next steps?
Hi @masroor07
If you added the rlm model, the predictions now are from the rlm model not the hlm?
Hi @masroor07
If you added the rlm model, the predictions now are from the rlm model not the hlm?
No, the HLM model runs on top of RLM model. It doesn't produce the same output as that of the RLM. I tried running both the models against the same SMILE inputs. Here are the results produced:
rlm: ADME_Predictions_2023-04-03-205445.csv
hlm: output.csv
CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
for the hlm model, output was stableConclusion: The model seems to be working fine.
Oh that's interesting, thanks @masroor07 ! Can you confirm you get the same result with the HLM model run from the NCATS code directly? And then please create PR for the model to be incorporated! Thanks
Oh that's interesting, thanks @masroor07 ! Can you confirm you get the same result with the HLM model run from the NCATS code directly? And then please create PR for the model to be incorporated! Thanks
Alright! thank you
Oh that's interesting, thanks @masroor07 ! Can you confirm you get the same result with the HLM model run from the NCATS code directly? And then please create PR for the model to be incorporated! Thanks
Hi @GemmaTuron I ran the HLM model from the NCATS code directly and can confirm, the results were the same! Thank you
Output:
Hello @GemmaTuron , Created a PR as well! Could you please review? Thank you
Hi @masroor07 ! I won't be able to merge it until our Git LFS quota has been resetted, apologies! It will be next week ;)
Hi @masroor07 ! I won't be able to merge it until our Git LFS quota has been resetted, apologies! It will be next week ;)
alright! thank you :)
Hi @masroor07
I've solved the issue, so I tried to merge your PR but several errors in the metadata file, the fields are pre coded and they should be filled in according to the detailed instructions on our documentation. I've modified it to speed this up But meanwhile, is the model converting the results to probability of 1 or you are simply outputing the 0 and 1 ? We prefer to give always Proba of 1
Hi @masroor07
I've solved the issue, so I tried to merge your PR but several errors in the metadata file, the fields are pre coded and they should be filled in according to the detailed instructions on our documentation. I've modified it to speed this up But meanwhile, is the model converting the results to probability of 1 or you are simply outputing the 0 and 1 ? We prefer to give always Proba of 1
The model provides predicted class (1 or 0) for a given compound. If the predicted class is '1', it means the compound is predicted as unstable (t1/2 <= 30 min) and if the predicted class is '0', the compound is predicted as stable (t1/2 > 30 min). The model also provides a probability score (between 0 and 1), shown in parentheses next to the predicted class.
Hi @masroor07
I've solved the issue, so I tried to merge your PR but several errors in the metadata file, the fields are pre coded and they should be filled in according to the detailed instructions on our documentation. I've modified it to speed this up But meanwhile, is the model converting the results to probability of 1 or you are simply outputing the 0 and 1 ? We prefer to give always Proba of 1
i can try to go through the metadata file and modify it in accordance to the documentation?
Hi @masroor07
It's fine I've already edited it with some changes in the description and the links. I've changed the interpretation from what was there: "It is a model which provides predicts class (1 or 0) for a given compound. If the predicted class is '1', it means the compound is predicted as unstable (t1/2 <= 30 min) and if the predicted class is '0', the compound is predicted as stable (t1/2 > 30 min). The model also provides a probability score (between 0 and 1), shown in parentheses next to the predicted class.", To what we want -- did you modify pauline's code or the output is still proba of 1?
Hi @masroor07
Actually having a closer look at the model, there are a few suggestions before merging it:
Hi @masroor07
Actually having a closer look at the model, there are a few suggestions before merging it:
- Metadata: please change the output type to the accepted inputs, in this case, it should be "Probability" if the model is giving the probability as output (which it should)
- I've noticed you put the model on another folder (with a small typo, hml instead of hlm). Please add all the checkpoints inside the model folder, not outside it If you can do this changes I'll test again the PR
I did change the output type to probability. I must have copied some directory hml
to the project folder, my bad! removed it
Thank you!
Hi @masroor07 Thanks, Can we confirm the interpretation of the model output is well written? Are we giving the probability of 1 only? Which means, stable or unstable? Please double check and confirm what is the right interpretation. The automated test is failing at model predict time, it does not find the pandas module, so probably an issue with conda env? https://github.com/ersilia-os/eos31ve/actions/runs/4615351353/jobs/8159147612?pr=1
Hi @masroor07 Thanks, Can we confirm the interpretation of the model output is well written? Are we giving the probability of 1 only? Which means, stable or unstable? Please double check and confirm what is the right interpretation. The automated test is failing at model predict time, it does not find the pandas module, so probably an issue with conda env? https://github.com/ersilia-os/eos31ve/actions/runs/4615351353/jobs/8159147612?pr=1
Addressed the environment issue for the model. I will try to review the code once again and confirm the interpretation of the model tomorrow, before making a commit. Thank you for the help
Made the changes to the environment.yml and made some changes to the interpretation of the model.
Committed the changes!
Hi @masroor07
I've been working on the model:
I cannot push via your repo so I have created my own fork to push to the ersilia repository, but thanks for all the work! You can now delete the ersilia forked repos from your user :)
Model Name
Human Liver Microsomal Stability
Model Description
Prediction of human liver microsomal stability is key for the screening of drugs in the early stage of drug discovery. The liver is the main organ for metabolizing drugs in humans and testing its metabolic stability is essential for the early detection of viable drug compounds.
Slug
hlm
Tag
hlm, liver, microsomal
Publication
https://pubmed.ncbi.nlm.nih.gov/17683964/
Source Code
https://github.com/ncats/ncats-adme/tree/master
License
MIT