Closed leilayesufu closed 10 months ago
@GemmaTuron
Let's start by this one @leilayesufu !
/approve
@leilayesufu ersilia model respository has been successfully created and is available at:
Now that your new model respository has been created, you are ready to start contributing to it!
Here are some brief starter steps for contributing to your new model repository:
Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository
README.md
file to accurately describe your modelIf you have any questions, please feel free to open an issue and get support from the community!
Hi @leilayesufu this is a fairly simple model and should be relatively straightforward to implement.
Fundamental to machine learning workflows is the requirement to preprocess your test data (or any unseen data) in the same way as the training data was preprocessed.
In the case of this particular model, by design whenever this model is used to make predictions for new inputs, the model code first attempts to fit a StandardScalar preprocessor in create_training_set
function in MRlogP class. Generally in machine learning workflows this is bad practice to "fit" a preprocessor on training data every time the model has to make new predictions. This introduces an overhead that is very easy eliminate.
I would recommend to fit a StandardScalar preprocessor using their training data ds_descriptors_500K
and save it as a pickle file before you begin incorporating this model into Ersilia. At that point, before you run predictions on any new input you can pre process that input using this scalar and then run predictions using the model.
https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/ the following tutorial will be a very easy walkthrou on fitting and saving preprocessors.
Hi @leilayesufu
Updates on this model?
HI @GemmaTuron, Good morning.
This is the project repo: https://github.com/JustinYKC/MRlogP This is what i've done so far: https://github.com/leilayesufu/eos9ym3
The code first takes a csv smiles input and turns it into a descriptors and then run predictions on the descriptors output, i have incorporated the scaler pickle file and it works well. But I'm facing an issue the repository works well in my local environment, i.e it creates the temporary descriptors output, run predictions on it and deletes the temporary descriptors file. Whie running it with ersilia, if the temporary descriptors file is already present, it works well and we get an output. But it's meant to generate the file from the smiles input. If the file is not present,, i get this error.
import openbabel
ModuleNotFoundError: No module named 'openbabel'
Traceback (most recent call last):
File "/home/leila/eos/repository/eos9ym3/20231218232341_E2A444/eos9ym3/artifacts/framework/code/main.py", line 42, in <module>
with open(descriptor_output, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/leila/eos/repository/eos9ym3/20231218232341_E2A444/eos9ym3/artifacts/framework/code/descriptors_temp.csv'
After going through the code, i found the script to generate descriptors uses openbabel.
obConversion = openbabel.OBConversion()
obConversion.SetInAndOutFormats("smi", "mdl")
ob_mol = openbabel.OBMol()
And for reason openbabel wasn't installing on ersilia. i tried
RUN pip install openbabel
and RUN conda install -c conda-forge openbabel
but the openbabel modile isn't installing and therefore it cannot create the descrriptors file needed for prediction.
Hi @leilayesufu
Try ersilia -v fetch eos... --repo_path
and find the piece of the log file that is trying to install openbabel
From @HellenNamulinda : RUN wget https://anaconda.org/conda-forge/openbabel/3.0.0/download/linux-64/openbabel-3.0.0-py27hdef5451_1.tar.bz2 RUN conda install -n eos2re5-py27 openbabel-3.0.0-py27hdef5451_1.tar.bz2 -y
Hi @leilayesufu
Also please update the model description to be a bit more comprehensive: The authors use a two-step approach to build a model that accurately predicts the lipophilicity (LogP) of small molecules. First, they train the model on a large amount of low accuracy predicted LogP values and then they fine-tune the network using a small, accurate dataset of 244 druglike compounds. The model achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP.
Many thanks!
@GemmaTuron Good morning, i tried Hellen's suggestion, i found the model that uses openbabel in the Ersilia model hub : https://github.com/ersilia-os/eos2re5
and i updated my model dockerfile as such: https://github.com/leilayesufu/eos9ym3
And i'm still getting the same error, it only works in ersilia when the descriptors file is present.
Hello @leilayesufu, The docker file didn't have to change a lot. The only command that needed modification is the one for installing openbabel.
From your docker file, you are creating an environment(eos2re5)which we shouldn't be doing for your model.
By working, Ersilia automatically creates an environment for each model that corresponds with it's model id.
So maintain your previous dockerfile and only modify how you install openbabel Like;
FROM bentoml/model-server:0.11.0-py37
MAINTAINER ersilia
RUN pip install rdkit
#RUN conda install -c conda-forge openbabel
RUN wget https://anaconda.org/conda-forge/openbabel/3.0.0/download/linux-64/openbabel-3.0.0-py27hdef5451_1.tar.bz2
RUN conda install openbabel-3.0.0-py27hdef5451_1.tar.bz2 -y
RUN pip install numpy
RUN pip install pandas
RUN pip install scikit-learn
RUN pip install TensorFlow
RUN pip install Keras
Also, get the right versions of the packages that you initially installed to test the source code. So instead of openbabel-3.0.0-py27hdef5451_1.tar.bz2, get the right python3 version that you installed initially. You can then check for the file at website
Okay, thank youuu I’ll try that right now
Update on my model. I have successfully been able to install openbabel, Thank you @HellenNamulinda This is the current state when i run ersilia fetch eos9ym3 eos9ym3.txt
I get an EmptyOutputError, although the outputs are printed inside the ersilia directory.
Hi @leilayesufu
Good progress! Seems an issue with Tensorflow, maybe the version is not working? Might I suggest using overall a newer python version (you are currently on py37, but maybe py310 would be better?) The Python version is specified in this line of the dockerfile: FROM bentoml/model-server:0.11.0-py37
Does it produce an output when running directly run.sh
?
For main.py
: I suggest tweaking the original functions so that they accept an input as a smiles list directly and use Ersilia's predefined reading of the input file
When i run i directly with bash, it produces an output. I'll implement the changes you suggested and get back to you
@leilayesufu
I have updated the openbabel package to a version that works with py310 and the model works fine for me. I've just pushed the changes to the repo, can you test them?
I’m on it, I’ll test it right away
Hi @GemmaTuron I have successfully been able to fetch and serve the model and make predictions on it.
Using one smile
🚀 Serving model eos9ym3: mrlogp
URL: http://127.0.0.1:56301
PID: 310660
SRV: conda
👉 To run model:
- run
💁 Information:
- info
(ersilia) leila@Leila:~/leila/eos9ym3$ ersilia run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
{
"input": {
"key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
"input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
"text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
},
"output": {
"outcome": [
1.5512152
]
}
}
Using two smiles
(ersilia) leila@Leila:~/leila/eos9ym3$ ersilia api run -i "['C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]','CC(C)CC1=CC=C(C=C1)C(C)C(=O)O']"
{
"input": {
"key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
"input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
"text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
},
"output": {
"outcome": [
1.5512147
]
}
}
{
"input": {
"key": "HEFNNWSXXWATRW-UHFFFAOYSA-N",
"input": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
"text": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
},
"output": {
"outcome": [
1.9770206
]
}
}
When testing with a smiles csv file i have, the output is this.
Therefore, the model works well, but the README file hasn't been updated yet.
This model is incorporated!
@leilayesufu wonderful job! Could you document here what was wrong and how you fixed it in as much detail as possible like so:
The model was tesing and producing an output when running locally but upon testing with ersilia, i got an empty output error. Although the output was printing out.
Problem log: eos9ym3.txt
This was fixed by updating the openbabel package to a version that works with py310
Model Name
Neural network-based logP prediction for druglike small molecules
Model Description
The model MRlogP, is a neural network-based predictor designed for accurately estimating the lipophilicity (logP) of small druglike molecules.The primary objective of MRlogP is to improve logP prediction accuracy, allowing for more informed decision-making in drug discovery.
Slug
MRlogP
Tag
Lipophilicity, LogP
Publication
https://www.mdpi.com/2227-9717/9/11/2029/htm
Source Code
https://github.com/JustinYKC/MRlogP
License
MIT