🦠 Model Request: Prediction of Rat Liver Microsomal Stability

paulinebanye commented 1 year ago

Model Name

RLM Stability

Model Description

Prediction of hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Hepatic metabolic stability can prevent a drug from attaining sufficient in vivo exposure, producing short half-lives, poor oral bioavailability and low plasma concentrations.

Slug

rlm-stability

Publication

An Automated High-Throughput Metabolic Stability Assay Using an Integrated High-Resolution Accurate Mass Method and Automated Data Analysis Software
Retrospective assessment of rat liver microsomal stability at NCATS: data and QSAR models
Analyzing Learned Molecular Representations for Property Prediction

Code

https://github.com/ncats/ncats-adme

License

No response

paulinebanye commented 1 year ago

@GemmaTuron @miquelduranfrigola

Observations

I was finally able to create the environment but I had issues running the model, the code python app.py returned _pickle.UnpicklingError: invalid load key, '<'. error. ncat

To resolve the error

After spending hours debugging,

[x] I reached out on the GitHub repo and I was informed that they had a security incident on the repo which caused them to revert the code on the main repo. I was asked to clone the development branch instead git clone --recursive -b development https://github.com/ncats/ncats-adme.git
[x] Successfully cloned the development branch
[x] Created the environment
[x] Installed all necessary dependencies
[x] Ran the code python app.py which began the download of the repos.
[ ] Ran into errors while downloading the models.
[ ] proceeded to download the model individually

paulinebanye commented 1 year ago

@GemmaTuron @miquelduranfrigola please review

Update

I was able to successfully predict the model using the provided csv file and the eml canonical file. kekule-rlm

eml-rlm

[x] Cloned the development branch instead git clone --recursive -b development https://github.com/ncats/ncats-adme.git
[x] Created the environment
[x] Installed all necessary dependencies
[x] Edited the app.py file in ncats-adme/server to isolate the download of the RLM model.
[x] Ran the code python app.py
[x] Successfully ran the predictions on the local server
[x] Tested with the kekule_smiles.csv provided by the repo and the eml_canonical.csv from Ersilia

Output csv files

RLM_Predictions_kekule.csv RLM_Predictions_eml.csv

GemmaTuron commented 1 year ago

Hi Pauline, this is great news thanks.

The authors provide a small FLASK application to serve the models. We now need to take the models one by one and try to use them outside their application. This means making a simple version of the app.py that does not use flask, but simply loads the data, calls the model and gets the prediction printed on the screen. It might seem a lot but actually all the functions we need are in the file already, we just need to simplify it --> to use the service.py file we use in Ersilia In our case, we will only do 1 model = 1 repository instead of the 5 models in the repository

So, Id' say:

Download the actual model checkpoints from the repo if you haven't already
Activate the conda environment with all packages
Try to run the model using the minimal code required (I think, in the RLM case, this will be the base, chemprop and rlm predictors)

In addition, I have a question regarding the output of the models: I see in the app.py they calculate some sort of similarity (line 280: # for all models except cyp450, calculate the nearest neigbors and add additional column to response_df) Do you know if this is happening or not? I cannot see it in the output

paulinebanye commented 1 year ago

Hi Pauline, this is great news thanks.

The authors provide a small FLASK application to serve the models. We now need to take the models one by one and try to use them outside their application. This means making a simple version of the app.py that does not use flask, but simply loads the data, calls the model and gets the prediction printed on the screen. It might seem a lot but actually all the functions we need are in the file already, we just need to simplify it --> to use the service.py file we use in Ersilia In our case, we will only do 1 model = 1 repository instead of the 5 models in the repository

So, Id' say:

Download the actual model checkpoints from the repo if you haven't already

Activate the conda environment with all packages

Try to run the model using the minimal code required (I think, in the RLM case, this will be the base, chemprop and rlm predictors)

In addition, I have a question regarding the output of the models: I see in the app.py they calculate some sort of similarity (line 280: # for all models except cyp450, calculate the nearest neigbors and add additional column to response_df) Do you know if this is happening or not? I cannot see it in the output

Thank you @GemmaTuron ☺️ I have isolated the code to run only the RLM and I have downloaded the model. I have never worked with flask before but I'm attempting to work on the functions now.

paulinebanye commented 1 year ago

Regarding the nearest neighbor I am unclear if this is actually running, I don't see any evidence of this in the output files.

GemmaTuron commented 1 year ago

Thanks @pauline-banye

Maybe we can try another model to check if the nearest neighbors are bein predicted. For Flask, you do not need to worry about it, the idea is to remove flask altogether as we will use BentoML and Ersilia's environment, so maybe I'd suggest to create a new folder with the minimum code (probably from the /server folder: base, chemprop, features, utilities, rlm) and the rlm model and try to run a prediction

paulinebanye commented 1 year ago

Thanks @pauline-banye

Maybe we can try another model to check if the nearest neighbors are bein predicted. For Flask, you do not need to worry about it, the idea is to remove flask altogether as we will use BentoML and Ersilia's environment, so maybe I'd suggest to create a new folder with the minimum code (probably from the /server folder: base, chemprop, features, utilities, rlm) and the rlm model and try to run a prediction

Hi @GemmaTuron thank you, I'm taking a look at it now.

GemmaTuron commented 1 year ago

We are continuing the discussion on #512

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! 🎉

@pauline-banye ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos5505

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

🍴 Get started by creating a fork of your new model repository - docs
👯 Clone your forked repository - docs
✏️ Make edits to your new forked model repository - docs - Edits might include:
- Updating the README.md file to accurately describe your model
- Add source code for your model
- Adding documentation for your model
🚀 Open a Pull Request from your forked repository to the original repository. This will allow you to bring your local changes into the new ersilia model repository that was just created! - docs

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

carcablop commented 1 year ago

Hello @pauline-banye. I create a main.py file, based on the app.py file as an example, removing what I considered unnecessary to execute the code in the console, my idea is that you can read the file and the output of the predict function can be written to a file csv. main_example.txt I would like you to test if this can work in your installed Conda environment, I could not test it in mine, and when I tried to install the dependencies this generated conflicts between dependencies, I don't know if you got these conflict errors but it is for me failing when I try to install Keras and NumPy (I use Ubuntu 20.04 on windows).

I hope this file will serve as a guide and help to implement it. I understand that you will only use an "rlm" model so I modified the predict_df function (before it used a for to go through a list of models from a file that you passed in the application, now this would not be necessary, just pass the model and that's it, that model I understand that it is in the model/rlm folder). I also removed several unnecessary imports from the flask. When I tried to install each dependency in my conda environment I realized that some dependencies were to be able to run the application with flask. I recommend you take into account the dependencies that you are going to install for the configuration of the docker file.

GemmaTuron commented 1 year ago

This model is now working! I'll close the issue

ersilia-os / ersilia