New model ready for testing!

github-actions[bot] commented 1 year ago

This model is ready for testing. If you are assigned to this issue, please try it out both on the CLI and Google Colab and let us know if it works!

GemmaTuron commented 1 year ago

@ahmedyusuff and @whoisorioki can you test this model please?

Thanks!

whoisorioki commented 1 year ago

@AhmedYusuff and @whoisorioki can you test this model please?

Thanks!

I'm on to it!

AhmedYusuff commented 1 year ago

Hi @GemmaTuron.

EOS2lQB MODEL TEST

I Tested the Model on my Ubuntu 22.04 system and Google Colab.

Result

[x] Model was fetched, served, and Predicted successfully on Google Colab.
Link to Colab Notebook. Colab
Input file. eml_canonical.csv
Colab Output File. eos2lqb_output.csv
[x] Model was fetched, served, and Predicted successfully on Ubuntu 22.04.
Fetch Log.fetch.log
Input file. eml_canonical.csv
Output file. eos2lqb.csv

whoisorioki commented 1 year ago

Hey @GemmaTuron, here are the updates:

I sucessfully fetched the model with ersilia fetch eos2lqb

(ersilia) whoisorioki@whoisorioki:~$ ersilia fetch eos2lqb
⬇️  Fetching model eos2lqb: human-oral-bioavailability
Checking setup: 1.326s                                                                                          
Preparing model: 6.267391920089722s                                                                             
Getting model: 10.339738845825195s                                                                              
Packing model: 116.72919297218323s                                                                              
Checking if model needs to be integrated to a tool: 0.0020003318786621094s                                      
Getting model card: 0.413377046585083s                                                                          
Checking that autoservice works: 5.895315170288086s                                                             
Sniffing model: 27.162338495254517s                                                                             
100%|█████████████████████████████████████████████████████████████████████████████| 8/8 [02:52<00:00, 21.54s/it]
Fetching eos2lqb done in time: 0:02:52.321658s
👍 Model eos2lqb fetched successfully!

I then deplolyed the model using ersilia serve eos2lqb to meke it available as an api. It worked succesfully!


(ersilia) whoisorioki@whoisorioki:~$ ersilia serve eos2lqb
🚀 Serving model eos2lqb: human-oral-bioavailability

URL: http://127.0.0.1:36651 PID: 104910 SRV: conda

👉 Available APIs:

run

💁 Information:

info

- I created a virtual environment with `python=3.8` for the model, actvated it and installed the following dependecies `rdkit-pypi`, `Mordred`, `pandas`, `matplotlib`, `scikit-learn==0.23.2`, `numpy<1.24`, `networkx==2.3`
- I then cloned the repository to my local machine and located the main file at `/home/whoisorioki/Desktop/Ersilia/Models/eos2lqb/model/framework/code/main.py`
- I used the ersilia `drug_molecules.tsv` in the Ersilia gitbook as input.
- I had to do some modification on the code since a tsv file was being taken as input. I used the `delimiter = '\t' ` in the csv.reader
-  It contains 5066 columns. I only used the first 100 columns using the following code:

smiles_list = [] for i, row in enumerate(reader): if i >= 100: break smiles_list.append(row[1])


- I ran the main file passing the the following arguments `python main.py drug_molecules.tsv output.csv`
- Here is the output: [output.csv](https://github.com/ersilia-os/eos2lqb/files/11128495/output.csv)

GemmaTuron commented 1 year ago

Hi @whoisorioki

This model was ready to be used, testing should work only with the three Ersilia commands fetch, serve and api, no changes in the code needed

whoisorioki commented 1 year ago

Oh okay, thank you for that infomation @GemmaTuron.

GemmaTuron commented 1 year ago

@HellenNamulinda

Upon looking at the output provided by @AhmedYusuff, I see that we are only giving high or low for each cut-off, when we would actually like to get the probability of high for each cut-off. This probability must be given by the model in order to classify molecules as high or low Do you think you can identify the piece of code doing that conversion and working on it to provide the probabilities as a number instead?

HellenNamulinda commented 1 year ago

@HellenNamulinda

Upon looking at the output provided by @AhmedYusuff, I see that we are only giving high or low for each cut-off, when we would actually like to get the probability of high for each cut-off. This probability must be given by the model in order to classify molecules as high or low Do you think you can identify the piece of code doing that conversion and working on it to provide the probabilities as a number instead?

Hello @GemmaTuron, Let me update the code to return the probabilities, i.e P(high), and P(low)

GemmaTuron commented 1 year ago

Great many thanks @HellenNamulinda !

I think with the Probability of High it is enough, so it will be only one value for the cutoff 20% and one for th 50%, what do you think?

HellenNamulinda commented 1 year ago

Great many thanks @HellenNamulinda !

I think with the Probability of High it is enough, so it will be only one value for the cutoff 20% and one for th 50%, what do you think?

Oh yeah, that's right. Also, if inference is made for an entire file, where the output is a csv file, it is easier to understand because the column names show. But, if someone is using CLI for one molecule, I think it can be quite hard to understand if the outcome printed is just two values say

"output": {
        "outcome": [
           0.22, 
          0.37
        ]
    }

where 0.22 is P(high) for cutoff 20% and 0.37 is P(high) for cutoff 50%. This is because column names are not printed, and that's why I had prepended HOB(20%): and HOB(50%): to the values saved.

Do you think printing the output like that is okay, and just provide detailed explanation on the interpretation in the README.md(from metadata.json)?

HellenNamulinda commented 1 year ago

Great many thanks @HellenNamulinda ! I think with the Probability of High it is enough, so it will be only one value for the cutoff 20% and one for th 50%, what do you think?

Oh yeah, that's right. Also, if inference is made for an entire file, where the output is a csv file, it is easier to understand because the column names show. But, if someone is using CLI for one molecule, I think it can be quite hard to understand if the outcome printed is just two values say
"output": {
        "outcome": [
           0.22, 
          0.37
        ]
    }
where 0.22 is P(high) for cutoff 20% and 0.37 is P(high) for cutoff 50%. This is because column names are not printed, and that's why I had prepended HOB(20%): and HOB(50%): to the values saved.

Do you think printing the output like that is okay, and just provide detailed explanation on the interpretation in the README.md(from metadata.json)?

Greetings @GemmaTuron, The changes have been made and merged. I fetched the new model and ran predictions on the eml_canonical.csv dataset. The output file is; output.csv

ersilia-os / eos2lqb

New model ready for testing! #2

EOS2lQB MODEL TEST

Result