ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
198 stars 128 forks source link

✍️ Contribution period: Isaak mwangi kamau #835

Closed Isaakkamau closed 9 months ago

Isaakkamau commented 10 months ago

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

Isaakkamau commented 10 months ago

Hello @DhanshreeA @HellenNamulinda My name is Isaak Kamau an aspiring AI Developer and I am happy to contribute again to the Ersilia project. My question is I had already installed Ersilia in the previous Outreachy cohort, is there a need to re-install it again? I have tested eos3b5e model and it has worked fine. Thank You ubuntu2004_CCuSCqizwy

HellenNamulinda commented 10 months ago

Hello @DhanshreeA @HellenNamulinda My name is Isaak Kamau an aspiring AI Developer and I am happy to contribute again to the Ersilia project. My question is I had already installed Ersilia in the previous Outreachy cohort, is there a need to re-install it again? I have tested eos3b5e model and it has worked fine. Thank You ubuntu2004_CCuSCqizwy

Hello Isaac, welcome to Ersilia. Please not that a lot of changes have been made to the ersilia codebase since the last contribution phase. After activating the ersilia env, you can check for the version of ersilia you have; ersilia --version Since you already have ersilia installed, just follow the steps below to update it to the latest version.

cd ersilia
git pull --rebase

You can check the new version after. ( latest is ersilia-0.1.27)

Isaakkamau commented 10 months ago

Thank you @HellenNamulinda I have done the update

Isaakkamau commented 10 months ago

Motivational Statement to work with Ersilia

Dear Ersilia Mentors,

My name is Isaak Kamau an aspiring AI Developer located in Nairobi Kenya. I graduated last year from the University of Nairobi with a Degree in Mathematics (Statistics) and a solid knowledge in Machine Learning Engineering and Data Science.

Being among the people who have been heavily affected by diseases and recently seeing a very close family member dying because of Tuberculosis after being misdiagnosed and treated for typhoid I believe if my very small home-based hospital was equipped with the right technology we could have spared a life. Since I don't have much knowledge of the drug discovery process I think one of the ways I can make a change is by collaborating and with guidance from the field experts from Ersilia. I believe at the end of the internship period I will be much equipped with the knowledge needed to make a contribution to my community. I am also planning to make donations to our community healthcare facility from what I will earn from the internship.

It's also good to note I have been part of the Ersilia community since the previous Outreachy cohort where I was able to make some contributions toward the Ersilia Model Hub (https://github.com/ersilia-os/ersilia/issues/620) and I'm also looking forward to making even more substantial contributions towards Ersilia vision. Together we can!

Best Regards Isaak Mwangi Kamau

Isaakkamau commented 10 months ago

WEEK 2

Select a model from the suggested list

I have successfully installed NCATS Rat Liver Microsomal Stability in my local system train-ipynb-Colaboratory-isaakmwangi2018-gmail-com-Gmail

As a machine learning engineer, It's an interesting project to me as how they have made it easy to use the models due to their molecule editor user interface/(models endpoints) that even those with a limited background in drug discovery or don't have technical knowledge on how machine learning models work they can just sketch molecules and get a prediction.

I also have tested STOUT model that I had installed in the previous cohort and it has also worked fine!

Isaakkamau commented 10 months ago

Errors I have encountered

  1. I was previously using WSL on my windows and the models were taking too long to load and my WSL was freezing. I solved it by installing dual boot (window and Linux) on my machine, I then switched to using Linux to locally install the project in my machine and it worked fine

  2. This was the second error I encountered:

    (/home/hl/ncats-adme/server/env) hl@hl-laptop:~/ncats-adme/server$ python app.py
    Traceback (most recent call last):
    File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
    ModuleNotFoundError: No module named 'predictors.rlm` 

Solution: I noticed some predictors Python module was corrupted during cloning and I re-installed them manually

DhanshreeA commented 10 months ago

Hey @Isaakkamau can you specify which model are you working with finally? NCATS or STOUT? Could you also share the outputs you obtained on EML, and compare them with the outputs obtained from the ersilia implementation?

DhanshreeA commented 10 months ago

Hi @Isaakkamau quickly following up, any updates? Do you need any support from us?

Isaakkamau commented 10 months ago

Hello @DhanshreeA Sorry for the delay, I have not been feeling well so I took a short break but I'm much better now.

About the models I'm working on NCATS model. I was able to install the model using the instructions provided in the official repo

(base) hl@hl-laptop:~$ cd ncats-adme
(base) hl@hl-laptop:~/ncats-adme$ cd server
(base) hl@hl-laptop:~/ncats-adme/server$ conda activate ncats-adme

EnvironmentNameNotFound: Could not find conda environment: ncats-adme
You can list all discoverable environments with `conda info --envs`.

(base) hl@hl-laptop:~/ncats-adme/server$ ls
app.py           env                  images             package-lock.json
client           environment_mac.yml  kekule_smiles.csv  predictors
default.profraw  environment.yml      models
(base) hl@hl-laptop:~/ncats-adme/server$ conda activate ./env
(/home/hl/ncats-adme/server/env) hl@hl-laptop:~/ncats-adme/server$ python app.py
Loading RLM graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".

Screenshot from 2023-10-18 13-59-33

After successfully running starting my server with python app.py I navigated to my client/http://127.0.0.1:5000 where the machine learning model are served in an interactive user interface.

The next step was downloading the eml_canonical.csv in my local machine then I uploaded the CSV file into the NCATS-ADME/PREDICTION for prediction

Here are some of the results I was getting:

1. solubility model:

Result Explanation

For this model there are 2 probable outcome i.e 1 for low solubility or 0 for high solubility

Let's take the first prediction as our example, We have the predicted class as 0(1.0) this means the model is 100% confident that the mol has high solubility

NCATS-CLIENT

2. Rat Liver Microsomal stability Model:

Result Explanation

For this model there are 2 probable outcome i.e 1 for unstable or 0 for stable

Let's take the first prediction as our example, We have the predicted class as 0(0.95) this means the model is 95% confident that the mol is stable NCATS-CLIENT2

Isaakkamau commented 10 months ago

Hello @DhanshreeA and @HellenNamulinda I'm now trying to serve the ncats using ersilia. I already have docker installed and pulled the image using docker pull ersiliaos/eos31ve Is there a step I'm missing? This is the error I have encountered:

(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~/ersilia$ ersilia -v fetch eos31ve
⬇️  Fetching model eos31ve: ncats-hlm
  0%|                                                                                             | 0/8 [00:00<?, ?it/s]18:42:03 | INFO     | GitHub CLI is not installed. Ersilia can work without it, but we highy recommend that you install this tool.
18:42:03 | DEBUG    | Git LFS is installed
Updated Git hooks.
Git LFS initialized.
18:42:03 | DEBUG    | Git LFS has been activated
18:42:04 | DEBUG    | Connected to the internet
18:42:04 | DEBUG    | Conda is installed
18:42:04 | DEBUG    | EOS Home path exists
Checking setup: 0.806s
 12%|██████████▋                                                                          | 1/8 [00:00<00:05,  1.24it/s]18:42:04 | INFO     | Starting delete of model eos31ve
18:42:08 | INFO     | Deleting conda environment eos31ve
/tmp/ersilia-i70rj4_1/script.sh: line 2: /root/miniconda3/etc/profile.d/conda.sh: Permission denied
/tmp/ersilia-i70rj4_1/script.sh: line 3: conda: command not found
18:42:08 | ERROR    | Ersilia exception class:
ModelDeleteError

Detailed error:
Error occured while deleting model eos31ve

Hints:
Check that the model is actually installed in your local device:
$ ersilia serve eos31ve

🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Ersilia exception class:
ModelDeleteError

Detailed error:
Error occured while deleting model eos31ve

Hints:
Check that the model is actually installed in your local device:
$ ersilia serve eos31ve

If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/isaakmwangi/eos/current.log
 12%|██████████▋                                                                          | 1/8 [00:06<00:44,  6.29s/it]
(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~/ersilia$

Thank you

Isaakkamau commented 10 months ago

Hello @DhanshreeA @HellenNamulinda I have been able to debug the error after some attempts, the problem I was trying to run the commands when I'm not the root user in the WSL I navigated the error by running su root After switching to root user the error disappeared but I had another one. The model was fetching up to 83% and the terminal was throwing 'something went wrong with ersilia'. Finally since my computer has dual boot I have decided to fetch and serve the model using Linux. This has run very well without issues.

For anyone with such issues with ersilia, I suggest you use Linux. I have done a lot of configurations to my WSL but still not smooth in running the developments environment

Isaakkamau commented 10 months ago

Here is the output I am getting after running (ersilia) hl@hl-laptop:~/ersilia$ ersilia -v serve eos31ve

ersilia ncats-hlm

Isaakkamau commented 10 months ago

After successively serving the model eos31ve: ncats-hlm I tried doing the inference via the URL: http://127.0.0.1:53063 local host webpage, but it seems impossible since our file is a CSV and the webpage is only asking for JSON formats

I have finally decided to call on the api via terminal using run command as suggested here:

🚀 Serving model eos31ve: ncats-hlm

   URL: http://127.0.0.1:53063
   PID: 2372
   SRV: conda

👉 Available APIs:
   - run

💁 Information:
   - info

This is the command I have used to make prediction:

ersilia) hl@hl-laptop:~/ersilia$ ersilia -v api run -i ~/Downloads/eml_canonical.csv -o ersilia_eml_output.csv -b 100

In the above command:

Isaakkamau commented 10 months ago

This is my file: ersilia_eml_output.csv

Isaakkamau commented 10 months ago

Hello @DhanshreeA @HellenNamulinda Why the Human Liver Microsomal Stability model has been stated in the https://github.com/ncats/ncats-adme project but when I run python app.py this are the only models available for prediction? Screenshot from 2023-10-18 14-05-28

Isaakkamau commented 10 months ago

WEEK 3:

Suggest a new model and document it (1);

Model Name:

DeepCchem

Publication :

https://deepchem.readthedocs.io/en/latest/ https://www.amazon.com/Deep-Learning-Life-Sciences-Microscopy/dp/1492039837

Source Code:

https://github.com/deepchem/deepchem

Description:

DeepChem aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.

Below are some examples of what you can attain with DeepChem according to their official publication:

Predict the solubility of small drug-like molecules

Predict binding affinity for small molecule to protein targets

Predict physical properties of simple materials

Analyze protein structures and extract useful descriptors

Count the number of cells in a microscopy image

License:

MIT License

Isaakkamau commented 10 months ago

WEEK 3:

Suggest a new model and document it (2);

Model Name:

opqua

Model Documentation :

https://github.com/pablocarderam/opqua#model-documentation

Source Code:

https://github.com/pablocarderam/opqua

Description:

Opqua is an epidemiological modeling framework for pathogen population genetics and evolution. The framework can be used to:

  1. Make predictions about the relationship between pathogen evolution and epidemiology.
  2. pathogen evolution through mutation, recombination, and/or reassortment
  3. influence of pathogen genome sequences on transmission and evolution, as well as host demographic dynamics

Its models are composed of populations (Each population may have its own unique parameters dictating the events that happen inside of it, including how pathogens are spread between its hosts and vectors.) containing hosts and/or vectors, which themselves may be infected by a number of pathogens with different genomes

License:

MIT License

Isaakkamau commented 10 months ago

WEEK 3:

Suggest a new model and document it (3);

Model Name:

MolGAN

Model Publication :

arxiv.org/abs/1805.11973

Source Code:

https://github.com/nicola-decao/MolGAN

Description:

According to the above publication, MolGan is a deep generative model for graph-structured data in the context of chemical synthesis. The model excels in generating nearly 100% valid compounds and it also directly generates molecular graphs, bypassing the need for costly graph matching techniques

License:

MIT License

GemmaTuron commented 9 months ago

Hello,

Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!