ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

🦠 Model Request: Prediction of Rat Liver Microsomal Stability #510

Closed paulinebanye closed 1 year ago

paulinebanye commented 1 year ago

Model Name

RLM Stability

Model Description

Prediction of hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Hepatic metabolic stability can prevent a drug from attaining sufficient in vivo exposure, producing short half-lives, poor oral bioavailability and low plasma concentrations.

Slug

rlm-stability

Tags

metabolic,stability,adme,drugdiscovery

Publication

Code

https://github.com/ncats/ncats-adme

License

No response

paulinebanye commented 1 year ago

@GemmaTuron @miquelduranfrigola

Observations

I was finally able to create the environment but I had issues running the model, the code python app.py returned _pickle.UnpicklingError: invalid load key, '<'. error. ncat

To resolve the error

After spending hours debugging,

paulinebanye commented 1 year ago

@GemmaTuron @miquelduranfrigola please review

Update

I was able to successfully predict the model using the provided csv file and the eml canonical file. kekule-rlm

eml-rlm

Output csv files

RLM_Predictions_kekule.csv RLM_Predictions_eml.csv

GemmaTuron commented 1 year ago

Hi Pauline, this is great news thanks.

The authors provide a small FLASK application to serve the models. We now need to take the models one by one and try to use them outside their application. This means making a simple version of the app.py that does not use flask, but simply loads the data, calls the model and gets the prediction printed on the screen. It might seem a lot but actually all the functions we need are in the file already, we just need to simplify it --> to use the service.py file we use in Ersilia In our case, we will only do 1 model = 1 repository instead of the 5 models in the repository

So, Id' say:

  1. Download the actual model checkpoints from the repo if you haven't already
  2. Activate the conda environment with all packages
  3. Try to run the model using the minimal code required (I think, in the RLM case, this will be the base, chemprop and rlm predictors)

In addition, I have a question regarding the output of the models: I see in the app.py they calculate some sort of similarity (line 280: # for all models except cyp450, calculate the nearest neigbors and add additional column to response_df) Do you know if this is happening or not? I cannot see it in the output

paulinebanye commented 1 year ago

Hi Pauline, this is great news thanks.

The authors provide a small FLASK application to serve the models. We now need to take the models one by one and try to use them outside their application. This means making a simple version of the app.py that does not use flask, but simply loads the data, calls the model and gets the prediction printed on the screen. It might seem a lot but actually all the functions we need are in the file already, we just need to simplify it --> to use the service.py file we use in Ersilia In our case, we will only do 1 model = 1 repository instead of the 5 models in the repository

So, Id' say:

  1. Download the actual model checkpoints from the repo if you haven't already
  2. Activate the conda environment with all packages
  3. Try to run the model using the minimal code required (I think, in the RLM case, this will be the base, chemprop and rlm predictors)

In addition, I have a question regarding the output of the models: I see in the app.py they calculate some sort of similarity (line 280: # for all models except cyp450, calculate the nearest neigbors and add additional column to response_df) Do you know if this is happening or not? I cannot see it in the output

Thank you @GemmaTuron ☺️ I have isolated the code to run only the RLM and I have downloaded the model. I have never worked with flask before but I'm attempting to work on the functions now.

paulinebanye commented 1 year ago

Regarding the nearest neighbor I am unclear if this is actually running, I don't see any evidence of this in the output files.

GemmaTuron commented 1 year ago

Thanks @pauline-banye

Maybe we can try another model to check if the nearest neighbors are bein predicted. For Flask, you do not need to worry about it, the idea is to remove flask altogether as we will use BentoML and Ersilia's environment, so maybe I'd suggest to create a new folder with the minimum code (probably from the /server folder: base, chemprop, features, utilities, rlm) and the rlm model and try to run a prediction

paulinebanye commented 1 year ago

Thanks @pauline-banye

Maybe we can try another model to check if the nearest neighbors are bein predicted. For Flask, you do not need to worry about it, the idea is to remove flask altogether as we will use BentoML and Ersilia's environment, so maybe I'd suggest to create a new folder with the minimum code (probably from the /server folder: base, chemprop, features, utilities, rlm) and the rlm model and try to run a prediction

Hi @GemmaTuron thank you, I'm taking a look at it now.

GemmaTuron commented 1 year ago

We are continuing the discussion on #512

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! πŸŽ‰

@pauline-banye ersilia model respository has been successfully created and is available at:

πŸ”— ersilia-os/eos5505

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources πŸ“š

If you have any questions, please feel free to open an issue and get support from the community!

carcablop commented 1 year ago

Hello @pauline-banye. I create a main.py file, based on the app.py file as an example, removing what I considered unnecessary to execute the code in the console, my idea is that you can read the file and the output of the predict function can be written to a file csv. main_example.txt I would like you to test if this can work in your installed Conda environment, I could not test it in mine, and when I tried to install the dependencies this generated conflicts between dependencies, I don't know if you got these conflict errors but it is for me failing when I try to install Keras and NumPy (I use Ubuntu 20.04 on windows).

I hope this file will serve as a guide and help to implement it. I understand that you will only use an "rlm" model so I modified the predict_df function (before it used a for to go through a list of models from a file that you passed in the application, now this would not be necessary, just pass the model and that's it, that model I understand that it is in the model/rlm folder). I also removed several unnecessary imports from the flask. When I tried to install each dependency in my conda environment I realized that some dependencies were to be able to run the application with flask. I recommend you take into account the dependencies that you are going to install for the configuration of the docker file.

GemmaTuron commented 1 year ago

This model is now working! I'll close the issue