ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
211 stars 148 forks source link

🦠 Model Request: MRlogP #905

Closed leilayesufu closed 10 months ago

leilayesufu commented 10 months ago

Model Name

Neural network-based logP prediction for druglike small molecules

Model Description

The model MRlogP, is a neural network-based predictor designed for accurately estimating the lipophilicity (logP) of small druglike molecules.The primary objective of MRlogP is to improve logP prediction accuracy, allowing for more informed decision-making in drug discovery.

Slug

MRlogP

Tag

Lipophilicity, LogP

Publication

https://www.mdpi.com/2227-9717/9/11/2029/htm

Source Code

https://github.com/JustinYKC/MRlogP

License

MIT

leilayesufu commented 10 months ago

@GemmaTuron

GemmaTuron commented 10 months ago

Let's start by this one @leilayesufu !

GemmaTuron commented 10 months ago

/approve

github-actions[bot] commented 10 months ago

New Model Repository Created! 🎉

@leilayesufu ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos9ym3

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

DhanshreeA commented 10 months ago

Hi @leilayesufu this is a fairly simple model and should be relatively straightforward to implement.

Fundamental to machine learning workflows is the requirement to preprocess your test data (or any unseen data) in the same way as the training data was preprocessed.

In the case of this particular model, by design whenever this model is used to make predictions for new inputs, the model code first attempts to fit a StandardScalar preprocessor in create_training_set function in MRlogP class. Generally in machine learning workflows this is bad practice to "fit" a preprocessor on training data every time the model has to make new predictions. This introduces an overhead that is very easy eliminate.

I would recommend to fit a StandardScalar preprocessor using their training data ds_descriptors_500K and save it as a pickle file before you begin incorporating this model into Ersilia. At that point, before you run predictions on any new input you can pre process that input using this scalar and then run predictions using the model.

https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/ the following tutorial will be a very easy walkthrou on fitting and saving preprocessors.

GemmaTuron commented 10 months ago

Hi @leilayesufu

Updates on this model?

leilayesufu commented 10 months ago

HI @GemmaTuron, Good morning.

This is the project repo: https://github.com/JustinYKC/MRlogP This is what i've done so far: https://github.com/leilayesufu/eos9ym3

The code first takes a csv smiles input and turns it into a descriptors and then run predictions on the descriptors output, i have incorporated the scaler pickle file and it works well. But I'm facing an issue the repository works well in my local environment, i.e it creates the temporary descriptors output, run predictions on it and deletes the temporary descriptors file. Whie running it with ersilia, if the temporary descriptors file is already present, it works well and we get an output. But it's meant to generate the file from the smiles input. If the file is not present,, i get this error.

    import openbabel
ModuleNotFoundError: No module named 'openbabel'
Traceback (most recent call last):
  File "/home/leila/eos/repository/eos9ym3/20231218232341_E2A444/eos9ym3/artifacts/framework/code/main.py", line 42, in <module>
    with open(descriptor_output, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/leila/eos/repository/eos9ym3/20231218232341_E2A444/eos9ym3/artifacts/framework/code/descriptors_temp.csv'

After going through the code, i found the script to generate descriptors uses openbabel.

obConversion = openbabel.OBConversion()
        obConversion.SetInAndOutFormats("smi", "mdl")
        ob_mol = openbabel.OBMol()

And for reason openbabel wasn't installing on ersilia. i tried RUN pip install openbabel and RUN conda install -c conda-forge openbabel but the openbabel modile isn't installing and therefore it cannot create the descrriptors file needed for prediction.

GemmaTuron commented 10 months ago

Hi @leilayesufu Try ersilia -v fetch eos... --repo_path and find the piece of the log file that is trying to install openbabel

GemmaTuron commented 10 months ago

From @HellenNamulinda : RUN wget https://anaconda.org/conda-forge/openbabel/3.0.0/download/linux-64/openbabel-3.0.0-py27hdef5451_1.tar.bz2 RUN conda install -n eos2re5-py27 openbabel-3.0.0-py27hdef5451_1.tar.bz2 -y

GemmaTuron commented 10 months ago

Hi @leilayesufu

Also please update the model description to be a bit more comprehensive: The authors use a two-step approach to build a model that accurately predicts the lipophilicity (LogP) of small molecules. First, they train the model on a large amount of low accuracy predicted LogP values and then they fine-tune the network using a small, accurate dataset of 244 druglike compounds. The model achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP.

Many thanks!

leilayesufu commented 10 months ago

@GemmaTuron Good morning, i tried Hellen's suggestion, i found the model that uses openbabel in the Ersilia model hub : https://github.com/ersilia-os/eos2re5

and i updated my model dockerfile as such: https://github.com/leilayesufu/eos9ym3

And i'm still getting the same error, it only works in ersilia when the descriptors file is present.

HellenNamulinda commented 10 months ago

Hello @leilayesufu, The docker file didn't have to change a lot. The only command that needed modification is the one for installing openbabel.

From your docker file, you are creating an environment(eos2re5)which we shouldn't be doing for your model.

By working, Ersilia automatically creates an environment for each model that corresponds with it's model id.

So maintain your previous dockerfile and only modify how you install openbabel Like;

FROM bentoml/model-server:0.11.0-py37
MAINTAINER ersilia

RUN pip install rdkit
#RUN conda install -c conda-forge openbabel
RUN wget https://anaconda.org/conda-forge/openbabel/3.0.0/download/linux-64/openbabel-3.0.0-py27hdef5451_1.tar.bz2
RUN conda install openbabel-3.0.0-py27hdef5451_1.tar.bz2 -y
RUN pip install numpy
RUN pip install pandas
RUN pip install scikit-learn
RUN pip install TensorFlow
RUN pip install Keras

Also, get the right versions of the packages that you initially installed to test the source code. So instead of openbabel-3.0.0-py27hdef5451_1.tar.bz2, get the right python3 version that you installed initially. You can then check for the file at website

leilayesufu commented 10 months ago

Okay, thank youuu I’ll try that right now

leilayesufu commented 10 months ago

Update on my model. I have successfully been able to install openbabel, Thank you @HellenNamulinda This is the current state when i run ersilia fetch eos9ym3 eos9ym3.txt

I get an EmptyOutputError, although the outputs are printed inside the ersilia directory.

outputempty
GemmaTuron commented 10 months ago

Hi @leilayesufu

Good progress! Seems an issue with Tensorflow, maybe the version is not working? Might I suggest using overall a newer python version (you are currently on py37, but maybe py310 would be better?) The Python version is specified in this line of the dockerfile: FROM bentoml/model-server:0.11.0-py37 Does it produce an output when running directly run.sh?

For main.py: I suggest tweaking the original functions so that they accept an input as a smiles list directly and use Ersilia's predefined reading of the input file

leilayesufu commented 10 months ago

When i run i directly with bash, it produces an output. I'll implement the changes you suggested and get back to you

GemmaTuron commented 10 months ago

@leilayesufu

I have updated the openbabel package to a version that works with py310 and the model works fine for me. I've just pushed the changes to the repo, can you test them?

leilayesufu commented 10 months ago

I’m on it, I’ll test it right away

leilayesufu commented 10 months ago

Hi @GemmaTuron I have successfully been able to fetch and serve the model and make predictions on it.

Using one smile

🚀 Serving model eos9ym3: mrlogp

   URL: http://127.0.0.1:56301
   PID: 310660
   SRV: conda

👉 To run model:
   - run

💁 Information:
   - info
(ersilia) leila@Leila:~/leila/eos9ym3$ ersilia run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
{
    "input": {
        "key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
        "input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
        "text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
    },
    "output": {
        "outcome": [
            1.5512152
        ]
    }
}

Using two smiles

(ersilia) leila@Leila:~/leila/eos9ym3$ ersilia api run -i "['C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]','CC(C)CC1=CC=C(C=C1)C(C)C(=O)O']"
{
    "input": {
        "key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
        "input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
        "text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
    },
    "output": {
        "outcome": [
            1.5512147
        ]
    }
}
{
    "input": {
        "key": "HEFNNWSXXWATRW-UHFFFAOYSA-N",
        "input": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
        "text": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
    },
    "output": {
        "outcome": [
            1.9770206
        ]
    }
}
leilayesufu commented 10 months ago

When testing with a smiles csv file i have, the output is this.

eos9ym3_output.csv

Therefore, the model works well, but the README file hasn't been updated yet.

GemmaTuron commented 10 months ago

This model is incorporated!

DhanshreeA commented 10 months ago

@leilayesufu wonderful job! Could you document here what was wrong and how you fixed it in as much detail as possible like so:

Problem

Solution

leilayesufu commented 10 months ago

PROBLEM

The model was tesing and producing an output when running locally but upon testing with ersilia, i got an empty output error. Although the output was printing out.

Problem log: eos9ym3.txt

Solution

This was fixed by updating the openbabel package to a version that works with py310