ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 124 forks source link

✍️ Contribution period: Pradnya #628

Closed Pradnya2203 closed 1 year ago

Pradnya2203 commented 1 year ago

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

GemmaTuron commented 1 year ago

Is REDIAL running on a webserver or you have access to the model checkpoints? If the latter, we could try to incorporate it in the hub!

Pradnya2203 commented 1 year ago

REDIAL is not running on a webserver, it is hosted on a website though which is http://drugcentral.org/Redial. We do have the access to the model checkpoints though.

GemmaTuron commented 1 year ago

Hi @Pradnya2203

That's great, did you run it through the webserver or did you install the model? could you try with downloading the checkpoints and running predictions if you didn't? If you did, I think we could try to incorporate this in the hub, what do you think?

Pradnya2203 commented 1 year ago

I installed the model and then ran the predictions using the sample data file available on their repository and got the results posted above. The checkpoints(.pkl files) were installed along with the model. I think we can try to incorporate this in the hub.

GemmaTuron commented 1 year ago

cool, feel free to go ahead and open a model request issue! Outreachy interns from the last round prepared a nice document about the whole process, which you can read in our docs: https://ersilia.gitbook.io/ersilia-book. - make sure to read this

Pradnya2203 commented 1 year ago

@GemmaTuron, I did open a model request issue, will read the documents now. Thank you

Pradnya2203 commented 1 year ago

I did find some other model suggestions as well.

Model Name:

ATC_CNN

Model Description:

Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. ATC_CNN presents a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development

Package Dependencies:

torch numpy pandas tensorflow importlib time utils tensorboardX

Slug:

ATC-CNN

Publication:

https://academic.oup.com/bib/article/23/5/bbac346/6677124

Supplementary Information:

Source Code:

https://github.com/lookwei/ATC_CNN

License:

None

Pradnya2203 commented 1 year ago

Model Name:

Reinvent

Model Description:

The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations. Reinvent aims to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space.

Package Dependencies:

requirements.txt

Slug:

reinvent

Publications:

https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00915#

Source Code:

https://github.com/MolecularAI/Reinvent

License:

Apache License 2.0

GemmaTuron commented 1 year ago

Hi @Pradnya2203 !

Thanks, can you add the ATC-CNN model to our list? For the Reinvent, we are already using it, though it is not in the Hub due to its complexity. Let's focus on the model incorporation

Pradnya2203 commented 1 year ago

@GemmaTuron, thanks I will now focus on model incorporation

Pradnya2203 commented 1 year ago

Hey @GemmaTuron I tried to incorporate redial-2020 into the Ersilia Model Hub but I am facing some issues. Here are the steps I followed:

This is my main.py file (I think this needs some change) main.txt (copied it to a .txt file cause this doesn't support .py file)

and this is the error

Traceback (most recent call last):
  File "main.py", line 152, in <module>
    get_predictions(temp_dir, results, csv_file)
  File "main.py", line 110, in get_predictions
    features_dictn = automate(temp_dir, csv_file)
  File "main.py", line 72, in automate
    features_rdkit = fg.get_fingerprints(stand_df, k, 'rdkDes', 'dummy_split', 'dummpy_numpy_folder')
  File "/home/pradnya/eos8fth/model/framework/code/get_features.py", line 66, in get_fingerprints
    X = rdkDes_scaler.transform(X)
  File "/home/pradnya/miniconda3/envs/redial-2020/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 414, in transform
    X *= self.scale_
ValueError: operands could not be broadcast together with shapes (13,208) (200,) (13,208) 

This is till where the model is running output.txt

Pradnya2203 commented 1 year ago

I also added ATC-CNN model to the suggestions list.

Pradnya2203 commented 1 year ago

I was able to solve that error and run the model using main.py. There was an issue with the conda environment. Now I am trying to fetch it. This is the error

Traceback (most recent call last):
  File "pack.py", line 2, in <module>
    from src.service import load_model
  File "/home/pradnya/eos/dest/eos8fth/src/service.py", line 3, in <module>
    from bentoml import BentoService, api, artifacts
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/__init__.py", line 28, in <module>
    from bentoml.service import (  # noqa: E402
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/service/__init__.py", line 38, in <module>
    from bentoml.service.inference_api import InferenceAPI
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/service/inference_api.py", line 24, in <module>
    import flask
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
ImportError: cannot import name 'escape' from 'jinja2' (/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/jinja2/__init__.py)

04:25:02 | DEBUG    | Activation done
04:25:02 | DEBUG    | Previous command successfully run inside eos8fth conda environment
04:25:02 | DEBUG    | Now trying to establish symlinks
04:25:02 | DEBUG    | BentoML location is None
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

expected str, bytes or os.PathLike object, not NoneType
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/pradnya/eos/current.log

I tried to install jinja2, change the version of flask and jinja2 both but am still facing this error.

This the entire log file current.log

GemmaTuron commented 1 year ago

Hi @Pradnya2203 !

Seems that there is a versioning issue: https://stackoverflow.com/questions/71718167/importerror-cannot-import-name-escape-from-jinja2 You can also try to bump all the model to py3.8 or above

Pradnya2203 commented 1 year ago

I tried the solution posted on stackoverflow but I am still facing the same error. I tried to change the python version but I am still facing the exact same error.

samuelmaina commented 1 year ago

HI @Pradnya2203! I have looked at your logs and you don't have Jinja 2 in your dockerfile so it won't be installed hence the error. Add it to the docker file and see if the error persists.

Pradnya2203 commented 1 year ago

Hey @samuelmaina,Thank you, I did add it in my dockerfile as well but I'm still facing the same issue.

GemmaTuron commented 1 year ago

Hi @Pradnya2203

Was the model developed in PY3.7? I would try a newer version if possible

Pradnya2203 commented 1 year ago

Yes it was developed in py 3.7 . I'll try that thanks

Pradnya2203 commented 1 year ago

Hey @GemmaTuron, I tried quite a few things (stackoverflow, changing version python,jinja2, flask and also tried to make some changes to the output csv file and dockerfile) but I get the same error everytime. Shall I make a pull request for it? You can check from your end as well. Also redial-2020 has 11 model types I have tried to output the results on only one of them. What else can I do?

GemmaTuron commented 1 year ago

Hi @Pradnya2203

Thanks for your work, let's pause it here as the contribution period is coming to an end! I'll revise the work and try to identify a solution