Closed github-actions[bot] closed 1 year ago
@ahmedyusuff and @whoisorioki can you test this model please?
Thanks!
@AhmedYusuff and @whoisorioki can you test this model please?
Thanks!
I'm on to it!
Hi @GemmaTuron.
I Tested the Model on my Ubuntu 22.04 system and Google Colab.
[x] Model was fetched, served, and Predicted successfully on Google Colab.
Link to Colab Notebook. Colab
Input file. eml_canonical.csv
Colab Output File. eos2lqb_output.csv
[x] Model was fetched, served, and Predicted successfully on Ubuntu 22.04.
Fetch Log.fetch.log
Input file. eml_canonical.csv
Output file. eos2lqb.csv
Hey @GemmaTuron, here are the updates:
I sucessfully fetched the model with ersilia fetch eos2lqb
(ersilia) whoisorioki@whoisorioki:~$ ersilia fetch eos2lqb
⬇️ Fetching model eos2lqb: human-oral-bioavailability
Checking setup: 1.326s
Preparing model: 6.267391920089722s
Getting model: 10.339738845825195s
Packing model: 116.72919297218323s
Checking if model needs to be integrated to a tool: 0.0020003318786621094s
Getting model card: 0.413377046585083s
Checking that autoservice works: 5.895315170288086s
Sniffing model: 27.162338495254517s
100%|█████████████████████████████████████████████████████████████████████████████| 8/8 [02:52<00:00, 21.54s/it]
Fetching eos2lqb done in time: 0:02:52.321658s
👍 Model eos2lqb fetched successfully!
ersilia serve eos2lqb
to meke it available as an api. It worked succesfully!
(ersilia) whoisorioki@whoisorioki:~$ ersilia serve eos2lqb
🚀 Serving model eos2lqb: human-oral-bioavailability
URL: http://127.0.0.1:36651 PID: 104910 SRV: conda
👉 Available APIs:
💁 Information:
- I created a virtual environment with `python=3.8` for the model, actvated it and installed the following dependecies `rdkit-pypi`, `Mordred`, `pandas`, `matplotlib`, `scikit-learn==0.23.2`, `numpy<1.24`, `networkx==2.3`
- I then cloned the repository to my local machine and located the main file at `/home/whoisorioki/Desktop/Ersilia/Models/eos2lqb/model/framework/code/main.py`
- I used the ersilia `drug_molecules.tsv` in the Ersilia gitbook as input.
- I had to do some modification on the code since a tsv file was being taken as input. I used the `delimiter = '\t' ` in the csv.reader
- It contains 5066 columns. I only used the first 100 columns using the following code:
smiles_list = [] for i, row in enumerate(reader): if i >= 100: break smiles_list.append(row[1])
- I ran the main file passing the the following arguments `python main.py drug_molecules.tsv output.csv`
- Here is the output: [output.csv](https://github.com/ersilia-os/eos2lqb/files/11128495/output.csv)
Hi @whoisorioki
This model was ready to be used, testing should work only with the three Ersilia commands fetch, serve and api, no changes in the code needed
Oh okay, thank you for that infomation @GemmaTuron.
@HellenNamulinda
Upon looking at the output provided by @AhmedYusuff, I see that we are only giving high or low for each cut-off, when we would actually like to get the probability of high for each cut-off. This probability must be given by the model in order to classify molecules as high or low Do you think you can identify the piece of code doing that conversion and working on it to provide the probabilities as a number instead?
@HellenNamulinda
Upon looking at the output provided by @AhmedYusuff, I see that we are only giving high or low for each cut-off, when we would actually like to get the probability of high for each cut-off. This probability must be given by the model in order to classify molecules as high or low Do you think you can identify the piece of code doing that conversion and working on it to provide the probabilities as a number instead?
Hello @GemmaTuron, Let me update the code to return the probabilities, i.e P(high), and P(low)
Great many thanks @HellenNamulinda !
I think with the Probability of High it is enough, so it will be only one value for the cutoff 20% and one for th 50%, what do you think?
Great many thanks @HellenNamulinda !
I think with the Probability of High it is enough, so it will be only one value for the cutoff 20% and one for th 50%, what do you think?
Oh yeah, that's right. Also, if inference is made for an entire file, where the output is a csv file, it is easier to understand because the column names show. But, if someone is using CLI for one molecule, I think it can be quite hard to understand if the outcome printed is just two values say
"output": {
"outcome": [
0.22,
0.37
]
}
where 0.22 is P(high) for cutoff 20% and 0.37 is P(high) for cutoff 50%. This is because column names are not printed, and that's why I had prepended HOB(20%): and HOB(50%): to the values saved.
Do you think printing the output like that is okay, and just provide detailed explanation on the interpretation in the README.md(from metadata.json)?
Great many thanks @HellenNamulinda ! I think with the Probability of High it is enough, so it will be only one value for the cutoff 20% and one for th 50%, what do you think?
Oh yeah, that's right. Also, if inference is made for an entire file, where the output is a csv file, it is easier to understand because the column names show. But, if someone is using CLI for one molecule, I think it can be quite hard to understand if the outcome printed is just two values say
"output": { "outcome": [ 0.22, 0.37 ] }
where 0.22 is P(high) for cutoff 20% and 0.37 is P(high) for cutoff 50%. This is because column names are not printed, and that's why I had prepended HOB(20%): and HOB(50%): to the values saved.
Do you think printing the output like that is okay, and just provide detailed explanation on the interpretation in the README.md(from metadata.json)?
Greetings @GemmaTuron, The changes have been made and merged. I fetched the new model and ran predictions on the eml_canonical.csv dataset. The output file is; output.csv
This model is ready for testing. If you are assigned to this issue, please try it out both on the CLI and Google Colab and let us know if it works!