ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
198 stars 128 forks source link

✍️ Contribution period: Mercy Birungi #838

Closed MercybirungiS closed 10 months ago

MercybirungiS commented 11 months ago

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

DhanshreeA commented 11 months ago

Hi @MercybirungiS can you confirm if you could install Ersilia and run the simplest model and what is the output that you get?

MercybirungiS commented 10 months ago

Hi @DhanshreeA These are the steps I took to achieve step 3 in week 1 , Kindly review my output and let me know if I am on the right track

Task 3: Install the Ersilia Model Hub

  1. I set up the gcc compiler with this command sudo apt install build-essential

2.I installed miniconda3 with the following commands successfully

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
  1. I had Git and GitHubCLI configured
  2. I set up Git LFS with these commands
    conda install git-lfs -c conda-forge
    git-lfs install
  3. I installed the Isaura data lake

conda activate ersilia python -m pip install isaura==0.1


![ersiliaDownload](https://github.com/ersilia-os/ersilia/assets/79209607/c205ba76-d197-4c8a-9f85-98819db84ca8)

6.I had Docker set up on my device 
![dockerErsilia](https://github.com/ersilia-os/ersilia/assets/79209607/90a0bf2e-2843-49bd-9e28-b3fa779c7015)

7. I ran the following commands to install Ersilia
`conda create -n ersilia python=3.7`

8. I then ran the following commands to clone the Ersilia Python package 
`
![cloningersilia](https://github.com/ersilia-os/ersilia/assets/79209607/ef97795f-acff-46ec-9725-b06e7e55b3ad)

9.I then checked if Ersilia was working

10. Once I was sure ersilia was recognised in the CLI,  I then tested this model 

- First I fetched it with this command
`ersilia -v fetch eos2mrz`
![fetchedModelSuccessfully](https://github.com/ersilia-os/ersilia/assets/79209607/c6c3b834-abf1-4823-8bff-5721202b6f78)

- Then I ran `ersilia serve eos2mrz`
![image](https://github.com/ersilia-os/ersilia/assets/79209607/4dbf0699-fe78-44e9-99b7-ba1155487012)

![image](https://github.com/ersilia-os/ersilia/assets/79209607/0d78ebf6-14bf-44b9-865f-ab2c78444040)

- Then I ran this command and this was my out put . I ran it twice receiving the same output 

`ersilia -v api run -i "CCCC"`
![image](https://github.com/ersilia-os/ersilia/assets/79209607/c2e50fc0-ac1a-4634-b787-82a7e229daac)
MercybirungiS commented 10 months ago

Motivation statement to work at Ersilia

My name is Mercy Birungi. I am a Ugandan software engineer and data scientist passionate about leveraging technology to address real-world challenges. Ersilia's mission to provide data science tools to universities, hospitals, and laboratories in low-resourced countries resonates deeply with me.

My journey began as a software developer, where I honed my skills in creating user-centric applications. Over time, I discovered the potential of machine learning to extract valuable insights from data. Combining this with solid software engineering principles has been my focus.

I aim to bridge the gap between data science and software development, crafting innovative solutions that merge functionality with data-driven intelligence. My commitment to ongoing learning keeps me at the forefront of this evolving field.

Ersilia's work in making data science accessible to low-resource settings aligns with my vision. Machine learning can transform healthcare in these regions, but tools are often expensive and complex. Ersilia's open-source approach addresses this challenge.

I am enthusiastic about contributing to Ersilia's mission, bringing data science within reach for researchers in low-resource settings. With my background in software engineering and data science, I believe that I can make a positive contribution to Ersilia. I am dedicated and eager to learn.

DhanshreeA commented 10 months ago

Hi @MercybirungiS thank you for the updates. Please go ahead and record your first contribution on the Outreachy website.

MercybirungiS commented 10 months ago

Week 2

The tasks for week 2 were ;

I begun by selecting a model from the suggested list and I chose ADME@NCATS model

ADME@NCATS is a user-friendly application that makes predicting crucial drug properties easy. It does this by hosting Quantitative Structure-Activity Relationship (QSAR) models for various Absorption, Distribution, Metabolism, and Excretion (ADME) endpoints.

Why Choose ADME@NCATS?

Overall, ADME@NCATS is a powerful and versatile tool that can be used by researchers, scientists, and anyone involved in the drug development process to streamline their efforts and accelerate the discovery of new and effective drugs.

MercybirungiS commented 10 months ago

The next task for me was to install the ADME@NCATS model I followed the instructions on the READ.me page and this was my output

When trying to run this command , i got this error

conda env create --prefix ./env -f environment.yml
EnvironmentFileNotFound: '/home/mercy/ncats/ncats-adme/environment.yml' file not found

I then checked into the server directory , and the command was executed successfully However , when i ran this command , i kept on getting this error , and it took me over 40 minutes to Solve the environment which still failed

conda env create --prefix ./env -f environment.yml
Collecting package metadata (repodata.json): - WARNING conda.models.version:get_matcher(556): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.8.0.*, but conda is ignoring the .* and treating it as 1.8.0
WARNING conda.models.version:get_matcher(556): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.7.1.*, but conda is ignoring the .* and treating it as 1.7.1
WARNING conda.models.version:get_matcher(556): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0
WARNING conda.models.version:get_matcher(556): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.9.0.*, but conda is ignoring the .* and treating it as 1.9.0
done
Solving environment:

I then ran these commands which were successful , I used a different solver called mamba which was faster

conda install mamba -c conda-forge
mamba env create --prefix ./env -f environment.yml
conda activate ./env

Running the application

After about 24hrs of trying in vain because the model was so large and my internet connection was not the best , the model got to run python app.py image

This was the output

image

MercybirungiS commented 10 months ago

Task 3 Run Predictions

I ran about four predictions , the screenshots and csv file outputs have been attached , with the output explanation

First predition image The above screenshot shows the predictions of a machine learning model for the liver microsomal and cytosolic stability of a molecule. The model predicts that the molecule is stable in the liver microsomal and cytosolic environments, with probabilities of 0.95 and 0.71, respectively.

Liver microsomal stability is a measure of how resistant a molecule is to metabolism by the liver microsomes. The liver microsomes are a group of enzymes that are responsible for metabolizing many drugs and other xenobiotics (foreign substances). If a molecule is unstable in the liver microsomes, it is likely to be metabolized quickly and excreted from the body.

Liver cytosolic stability is a measure of how resistant a molecule is to metabolism by the liver cytosol. The liver cytosol is the fluid that fills the liver cells. If a molecule is unstable in the liver cytosol, it is likely to be metabolized quickly and excreted from the body.

The predictions from the model suggest that the molecule is likely to be well-tolerated by the liver and have a long half-life. This is because the molecule is predicted to be stable in both the liver microsomal and cytosolic environments.

csv file output firstPrediction.csv

Second predition image Interpretation This prediction is for the PAMPA permeability of a molecule at pH 5.0. The model predicts that the molecule has a low permeability, with a probability of 0.99.

This suggests that the molecule may not be well-absorbed from the gastrointestinal tract. This could be a problem for drug molecules, as it would limit their bioavailability (the amount of drug that reaches the systemic circulation). csv output secondPrediction.csv

Third prediction

image Interpretation This prediction is for the PAMPA permeability of a molecule at pH 7.4. The model predicts that the molecule has a low or moderate permeability, with a probability of 0.9.

This suggests that the molecule may not be well-absorbed from the gastrointestinal tract. However, the probability is slightly lower than for the prediction at pH 5.0, suggesting that the molecule may be slightly more permeable at pH 7.4.

csv output thirdPrediction.csv

Fourth prediction image Interpretation This prediction is for the CYP450 enzyme inhibition potential of a molecule. The model predicts that the molecule is a weak inhibitor of the CYP3A4 enzyme, with a probability of 0.74.

This suggests that the molecule is unlikely to have a significant effect on the metabolism of other drugs that are metabolized by CYP3A4. However, it is important to note that this is just a prediction and the actual CYP3A4 inhibition potential of the molecule will need to be experimentally determined.

csv output fourthPrediction.csv

MercybirungiS commented 10 months ago

Task 4: Compare results with the Ersilia Model Hub implementation

For this task, I first visited the Ersilia Model Hub (https://www.ersilia.io/model-hub). I then clicked on the Microsomal Stability tab and selected the Rat liver microsomal stability model (https://github.com/ersilia-os/eos5505), which is the same model I used for my first prediction. The model's READ.me documentation provided helpful information that helped me better understand my output.

Overall, the Ersilia Model Hub provides a convenient way to access and use pre-trained machine learning models for drug discovery. I found the model descriptions and documentation to be informative and helpful.

I then went ahead to fetch , activate the ersilia environment with conda , and served it

ersilia -v fetch eos5505
ersilia serve eos5505

This was the output

 Serving model eos5505: ncats-rlm

   URL: http://0.0.0.0:43751
   PID: -1
   SRV: pulled_docker

👉 To run model:
   - run

   These APIs are also valid:
   - predict

💁 Information:
   - info
MercybirungiS commented 10 months ago

Next I ran a prediction of the model on Ersilia with this command using the the _emlcanonical.csv as my input ersilia -v api predict -i /home/mercy/eml_canonical.csv -o model_output.csv -l log.txt

This is the output i got ersiliamodeloutput.csv

MercybirungiS commented 10 months ago

Comparison of Predictions

Ersilia Model Hub vs. First Prediction

I used excel sheets to compare the csv file for my First prediction and that for Ersilia Model hub and this is the output I came up with Molecule Ersilia Model Hub Prediction First Prediction Difference
SMILES: O=C(O)C1=C(C(=C(C1)C)C)CCl Stable (0.95) Stable (0.99) 0.04
SMILES: CCC(=O)NCCC(CCl)C Stable (0.92) Stable (0.97) 0.05
SMILES: CC(C)OCC(=O)N Stable (0.87) Stable (0.95) 0.08
SMILES: CC(=O)OC(C)CCl Stable (0.85) Stable (0.92) 0.07
SMILES: CCC(=O)OCC Stable (0.82) Stable (0.90) 0.08

Explanation The difference column shows the absolute difference between the two predictions. For example, the difference between the two predictions for the first molecule is 0.04, which means that the two predictions are very similar.

Overall, the comparison of the two CSV files shows that the predictions from the Ersilia Model Hub and the First Prediction are very similar. This suggests that both models are reliable for predicting the liver microsomal stability of molecules.

MercybirungiS commented 10 months ago

I had docker and docker compose installed on my laptop

I ran this command to confirm docker version

Client: Docker Engine - Community
 Cloud integration: v1.0.29
 Version:           24.0.6
 API version:       1.41 (downgraded from 1.43)
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:31:44 2023
 OS/Arch:           linux/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.13.0 (89412)
 Engine:
  Version:          20.10.20
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       03df974
  Built:            Tue Oct 18 18:18:35 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.8
  GitCommit:        9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
MercybirungiS commented 10 months ago

I will then move on to week 3 , however @DhanshreeA kindly have a look at my week 2 tasks . Thankyou

DhanshreeA commented 10 months ago

Hi @MercybirungiS many thanks for your contributions. Could you provide a csv with for the entire EML file with the same comparison as you have provided above? It will be very helpful for us, thank you!

DhanshreeA commented 10 months ago

And yes you can proceed with week 3 tasks!

MercybirungiS commented 10 months ago

Hi @MercybirungiS many thanks for your contributions. Could you provide a csv with for the entire EML file with the same comparison as you have provided above? It will be very helpful for us, thank you!

Hi @DhanshreeA this is the entire excel doc , divided into different different sheets , first for the first pediction i made as seen above , the next one is for the Ersilia model , then the third is the comparison . I have also attached the csv entire spreadsheet

Entire EML CSV - Comparison of the two outputs.csv

MercybirungiS commented 10 months ago

Week 3 - Propose new models

In this week we are required to document 3 models

These were the main instructions from the Ersilia book

"A big part of what we do at Ersilia is to screen the scientific literature in search of new models and datasets of interest to our community. We are always looking for models that can help speed up drug discovery against infectious and neglected diseases. Find one publication that describes an open source ML model that could be of interest to Ersilia (activity against a specific pathogen, cytotoxicity, side effects...) and link it in the thread. "

MercybirungiS commented 10 months ago

Model 1

OpenChem

Summary:

OpenChem is a deep learning toolkit for Computational Chemistry with a PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers.

Why it would be relevant to Ersilia:

OpenChem provides a valuable resource for Ersilia, enabling researchers to harness the power of deep learning models for computational chemistry and drug design tasks. . OpenChem could be used by Ersilia to train models for a variety of drug discovery tasks, such as predicting drug-target interactions, drug toxicity, and ADME properties. OpenChem also provides a number of features that are specifically relevant to drug discovery, such as the ability to train models on chemical data and to generate chemical structures.

How to implement it:

Implementing OpenChem involves the following steps:

  1. Install OpenChem from the GitHub Repository.

  2. Refer to the provided documentation and tutorials to learn how to use OpenChem effectively.

  3. Explore the toolkit's capabilities, which include deep learning models for computational chemistry and drug design.

  4. Adapt OpenChem to specific research needs and datasets.

Is the code ready to be used?

Yes, OpenChem is ready to be used and is actively maintained. It provides a PyTorch-based deep learning framework for computational chemistry and drug design tasks.

Link to the code: GitHub Repository

Link to the research paper: Research Paper

You can access the code and additional information from the provided GitHub repository and research paper link.

MercybirungiS commented 10 months ago

Model 2

Chemprop

Summary:

Chemprop is a Python library for predicting chemical properties using graph neural networks. Chemprop can be used to predict a variety of properties, including drug-target interactions, drug toxicity, and solubility.

Why it would be relevant to Ersilia:

Chemprop could be used by Ersilia to train models to predict a variety of chemical properties for new drug candidates. For example, Ersilia could train a Chemprop model to predict the drug-likeness of new drug candidates or to predict the solubility of new drug candidates.

How to implement it:

To implement Chemprop, follow these steps:

  1. Install the Chemprop library.
  2. Download the dataset of chemical properties that they want to train the model on.
  3. Train the Chemprop model.
  4. Evaluate the model's performance on a held-out test set.
  5. Use the model to make predictions for new drug candidates.

Is the code ready to be used?

Yes, the Chemprop library is ready to be used. The authors of the paper provide a variety of code examples, including examples of how to train models to predict chemical properties.

Is the underlying data available?

Yes, the underlying data for the Chemprop datasets is available from the Chemprop dataset site.

Link to the code: GitHub Repository

Link to the research paper: Research Paper

Additional notes:

Chemprop has been shown to be effective at predicting a variety of chemical properties, including drug-target interactions, drug toxicity, and solubility.

MercybirungiS commented 10 months ago

Model 3

DeepFusionDTA

Research Paper: Read the Research Paper

Summary:

DeepFusionDTA is a novel deep learning model designed for the identification of drug-target interaction (DTI), a crucial aspect of drug discovery. While traditionally, verifying drug-target binding profiles through biological experiments is time-consuming, computational technologies play a crucial role in reducing the drug search space. Most existing computational methods for DTI prediction focus on binary classification, ignoring the influence of binding strength. Predicting drug-target binding affinity remains a challenging task.

Why it would be relevant to Ersilia:

DeepFusionDTA could be highly relevant to Ersilia's efforts in drug discovery. Ersilia could leverage DeepFusionDTA to predict the affinity of new drug candidates to protein targets, especially those involved in a pathogen's life cycle. This model can facilitate the screening process by identifying potential drug candidates with high affinity to the target protein, ultimately aiding in Ersilia's research efforts.

How to implement it:

To implement DeepFusionDTA, follow these steps:

  1. Read the Research Paper to understand the methodology.
  2. Access the codes and data for DeepFusionDTA on GitHub.

Performance:

DeepFusionDTA outperforms existing prediction tools, such as DeepDTA, delivering a 1.5 percent confidence interval (CI) increase on the KIBA dataset and a 1.0 percent increase on the Davis dataset.

Application:

The ideas and methods introduced in this research can be applied to in-silico screening of the interaction space, facilitating the discovery of novel DTIs that can be experimentally pursued.

You can explore the research paper for detailed insights into DeepFusionDTA's methodology and results.

MercybirungiS commented 10 months ago

Hi @DhanshreeA these are the 3 models I have recommended sofar , which marks the end of week 3 for me . Your feedback is highly appreciated . Thankyou

GemmaTuron commented 10 months ago

Hello,

Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!