Closed TsionZerihun closed 7 months ago
@DESKTOP-92JJ0KD:~$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
@DESKTOP-92JJ0KD:~$ git version git version 2.17.1
DESKTOP-92JJ0KD:~$ git-lfs install Git LFS initialized. @DESKTOP-92JJ0KD:~$ git lfs --version git-lfs/3.4.0 (GitHub; linux amd64; go 1.20.6; git d06d6e9e)
(ersilia) @DESKTOP-92JJ0KD:~$ conda list isaura
isaura 0.1 pypi_0 pypi
@DESKTOP-92JJ0KD:~$ docker --version Docker version 20.10.21, build 20.10.21-0ubuntu1~18.04.3
- ### *Installed ersilia successfully without error*
```console
@DESKTOP-92JJ0KD:~/ersilia$ ersilia --help
Usage: ersilia [OPTIONS] COMMAND [ARGS]...
---
...🦠 Welcome to Ersilia! 💊
@DESKTOP-92JJ0KD:~/ersilia$ ersilia catalog...
[
{
"Identifier": "eos1086"
} ...
(ersilia) @DESKTOP-92JJ0KD:~$ ersilia -v fetch eos3b5e
⬇️ Fetching model eos3b5e: molecular-weight...
---
...👍 Model eos3b5e fetched successfully!
...💁 Information:
- ### *However, I faced an error when running prediction*
@DESKTOP-92JJ0KD:~$ ersilia -v run -i "CCC" > my.log 2>&1
Check attached error log file for more detail.
[model_eos3b5e_error.log](https://github.com/ersilia-os/ersilia/files/13115534/model_eos3b5e_error.log)
- ### *I tried removing isaura based on this [discussion](https://github.com/ersilia-os/ersilia/issues/839)*
@DESKTOP-92JJ0KD:~$ python -m pip uninstall isaura Found existing installation: isaura 0.1 Uninstalling isaura-0.1: ... Successfully uninstalled isaura-0.1
- ### *I was able to successfully run prediction after uninstalling isaura and rerunning*
@DESKTOP-92JJ0KD:~$ ersilia -v api run -i "CCCC" { "input": { "key": "IJDNQMDRQITEOD-UHFFFAOYSA-N", "input": "CCCC", "text": "CCCC" }, "output": { "mw": 58.123999999999995 } }
Check attached prediction success log file for more detail.
[model_eos3b5e_sucess.log](https://github.com/ersilia-os/ersilia/files/13118376/model_eos3b5e_sucess.log)
- Lesson Learned
Isura: after facing issues running model when isura was install, I went on to see what this particular python package did. Its purpose is to cache previously calculated properties. (which is located under ersilia repository)
I will make sure to reinstall isaura if caching is necessary when running future models.*
"CCCC": "IUPAC name for C-C-C-C is Hex-2-ene, and contains-only carbon chain"
- Summary
The above task runs the model eos3b5e with the molecule we provide which is "CCCC" , and determines its molecular weight in g/mol.
I'd like to introduce myself as a person who is enthusiastic about automation and its impacts. In addition to studying computer science, I completed a 1 year intensive software engineering program with a focus on backend development at ALX. Over the past few years, I have worked in various sectors, including full-stack development and data analysis, utilizing my research and analytical skills to improve smallholder farmers' lives.
I was first introduced to ML when working on a project on developing a REST API for a Y-Maze test. It was based on a pretrained model aiming to automate laboratory researchers' tasks. The project initially seemed intimidating, but with intensive research, I learned a lot and am grateful for the exposure.
It was fascinating to see that Ersilia has already begun the path of utilizing currently released technologies to help the world. I think that technology has the potential to change people's lives in different sectors, but that potential is not fully utilized. There is yet to be discovered. Research centers and labs are not as fully automated as they could be. It was quite exciting when I ran the previous (ersila week 1) task with minimal commands and setups. I was able to run a test, which might have taken a significant amount of time, energy, and skill. Ersilia, in my opinion, is achieving what most software engineers and I hope to achieve, which is using various technologies to improve the world and serve as an inspiration for future innovations. I believe we are in the perfect time where initiatives and ideas can be easily brought to light, but many people still lack knowledge and tools. Growing open-source projects like Ersilia can assist individuals learn and contribute to the community.
It would be a great pleasure to join a nonprofit organization that aims to assist experts in discovering new drugs for treating infectious and neglected diseases using the latest technologies, making it convenient and time-saving. My long-term goal is to work on a project that improves people's lives. I have the vision to enhance my country and the world by coming up with new solutions to various problems. I would love to be a part of the team so that I can give back to the community. I'm eager to contribute to and learn from Ersilia!
Best, Tsion Zeleke
- Microsomal Stability :The metabolism of a new chemical entity(drugs) or its time dependent decrease in the incubation mixtures containing liver microsomes
So it uses rats microsomes to tests its microsomal stability which helps in screening of drug candidates in the early stage of drug development*
(base) robel@DESKTOP-92JJ0KD:~$ git clone --recursive https://github.com/ncats/ncats-adme.git
Cloning into 'ncats-adme'...
remote: Enumerating objects: 3640, done....
...
(base) robel@DESKTOP-92JJ0KD:~$ ls
bentoml eos ersilia miniconda3 model_eos3b5e_error.log model_eos3b5e_sucess.log ncats-adme
---
#from predictors.hlm.hlm_predictor import HLMPredictior
#from predictors.pampa.pampa_predictor import PAMPAPredictior
#from predictors.pampa50.pampa_predictor import PAMPA50Predictior
#from predictors.pampabbb.pampa_predictor import PAMPABBBPredictior...
---
...def predict(): ...
---
for model in models:
response[model] = {}
error_messages = []
# if model.lower() == 'hlm':
# predictor = HLMPredictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
if model.lower() == 'rlm':
predictor = RLMPredictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
# elif model.lower() == 'pampa':
# predictor = PAMPAPredictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
#elif model.lower() == 'pampa50':
# predictor = PAMPA50Predictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
# elif model.lower() == 'pampabbb':
# predictor = PAMPABBBPredictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
elif model.lower() == 'solubility':
predictor = SolubilityPredictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
# elif model.lower() == 'hlc':
# predictor = LCPredictor(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
# elif model.lower() == 'cyp450':
# predictor = CYP450Predictor(kekule_mols = working_df['mols'].values, smiles=working_df[smi_column_name].values)
else:
break...
- ### *I faced an error when trying to create the environment based on the installation guide in the Repo*
* I modified the command from the official Repo from `conda env create --prefix ./env -f environment.yml` to `conda env create --prefix ./env -f server/environment.yml`since they have moved the enviroment.yml file to the server directory
```console
@DESKTOP-92JJ0KD:~/ncats-adme$ cd server
@DESKTOP-92JJ0KD:~/ncats-adme/server$ conda env create --prefix ./env -f environment.yml
Collecting package metadata (repodata.json): - Killed
(base) @DESKTOP-92JJ0KD:~/ncats-adme$ conda activate "./env"
(/home/robel/ncats-adme/env) robel@DESKTOP-92JJ0KD:~/ncats-adme$
#Environment successfully changed to env
typed-argument-parser
, request
, HealthCheck
, flask-swagger-ui
(/home/robel/ncats-adme/env) robel@DESKTOP-92JJ0KD:~/ncats-adme/server$ python app.py
Loading RLM graph convolutional neural network model...
---
...WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
![server_url](https://github.com/ersilia-os/ersilia/assets/101357449/7d87d516-7a72-43f2-9609-a3cd0d0be079)
Smiles | Prediction | Probablity |
---|---|---|
chemical notation of the drug in a way that can be used by the computer |
Unstable: If drug undergoes biotransformation in less than 30min Stable: >30min |
The odds of the prediction. (Probability of stability or instability of the drug based on prediction) |
Here is the csv file of the prediction. ADME_Predictions_2023-10-27-164557.csv
cd ersilia
, activated ersilia env conda activate ersilia
and fetched the RML model from ersilia's repo using ersilia -v fetch eos5505
(ersilia) robel@DESKTOP-92JJ0KD:~/ersilia$ ersilia -v fetch eos5505
⬇️ Fetching model eos5505: ncats-rlm
17:37:59 | DEBUG | Initialized with URL: None
17:37:59 | DEBUG | Trying to find an available URL where the model is hosted
17:38:07 | DEBUG | Git LFS is installed
Updated Git hooks.
Git LFS initialized.
17:38:07 | DEBUG | Git LFS has been activated
17:38:08 | DEBUG | Conda is installed...
abacavir
and run predictions0.049
for abacavir
(ersilia) robel@DESKTOP-92JJ0KD:~/ersilia$ ersilia -v api predict -i "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
11:26:10 | DEBUG | Getting session from /home/robel/eos/session.json...
-----------------------------------
{
"input": {
"key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
"input": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1",
"text": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
},
"output": {
"outcome": [
0.049
]
}
}
(ersilia) robel@DESKTOP-92JJ0KD:~/ersilia$ ersilia -v api predict -i ./assets/eml_canonical.csv -o ersilia_prediction.csv
11:38:22 | DEBUG | Getting session from /home/robel/eos/session.json...
-----------------
12:11:05 | DEBUG | Status code: 200
12:11:05 | DEBUG | Done with unique posting
12:11:17 | DEBUG | Data: outcome
12:11:17 | DEBUG | Values: [0.049]
12:11:17 | DEBUG | Datatype: numeric_array
ersilia_prediction.csv
(ersilia) robel@DESKTOP-92JJ0KD:~/ersilia$
Bring only the prediction(probability) column from the two cells
From the ADME predicted cell I remove the "parenthesis" and "1"(stability value ADME predicted) using excel =Remove()
formula
Check the diff between the the models `
Filtered diff's by substracting ADME value with ersilia
Checked using =COUNTIF()
how many of the diffs are > 0.1, which turned out be be 8 out of 189
The above comparison shows that the two prediction have almost 98% similarly. (with 2% of the diffs being >=0.1 *similar in terms of prediction the drugs instability probability
I have attached comparison file below ADME_Ersilia_Comparsion.xlsx
- Lesson Learned
in vitro: "vitro is Latin for “in glass.” medical procedures/tests perform outside of a living organism. such as a test tube or petri dish"
in vivo: "research done on a living organism"
- Summary
The above task runs the model Rat Liver Microsomal Stability(RLM) and (eos5505)molecule and compares their output.
@DESKTOP-92JJ0KD:/home/robel# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
719385e32844: Pull complete
Digest: sha256:88ec0acaa3ec199d3b7eaf73588f4518c25f9d34f58ce9a0df68429c5af48e8d
Status: Downloaded newer image for hello-world:latest
For more examples and ideas, visit: https://docs.docker.com/get-started/
@DESKTOP-92JJ0KD:/home/robel#
python app.py
to check if it still works which it does. Here is a preview of the file
from predictors.hlm.hlm_predictor import HLMPredictior
from predictors.pampa.pampa_predictor import PAMPAPredictior
from predictors.pampa50.pampa_predictor import PAMPA50Predictior
from predictors.pampabbb.pampa_predictor import PAMPABBBPredictior
from predictors.solubility.solubility_predictor import SolubilityPredictior
from predictors.liver_cytosol.lc_predictor import LCPredictor
from predictors.cyp450.cyp450_predictor import CYP450Predictor
```diff
if model.lower() == 'rlm':
predictor = RLMPredictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
elif model.lower() == 'pampa':
predictor = PAMPAPredictior(kekule_smiles = working_df['kekule_smiles'].values, -smiles=working_df[smi_column_name].values)
elif model.lower() == 'pampa50':
predictor = PAMPA50Predictior(kekule_smiles = working_df['kekule_smiles'].values, smiles=working_df[smi_column_name].values)
elif model.lower() == 'pampabbb':
predictor = PAMPABBBPredictior(kekule_smiles = working_df['kekule_smiles'].values, -smiles=working_df[smi_column_name].values)
elif model.lower() == 'solubility':
predictor = SolubilityPredictior(kekule_smiles = working_df['kekule_smiles'].values, -smiles=working_df[smi_column_name].values)
elif model.lower() == 'hlc':
predictor = LCPredictor(kekule_smiles = working_df['kekule_smiles'].values, -smiles=working_df[smi_column_name].values)
elif model.lower() == 'cyp450':
predictor = CYP450Predictor(kekule_mols = working_df['mols'].values, smiles=working_df[smi_column_name].values
```diff
if model.lower() != 'cyp450':
# for all models except cyp450, calculate the nearest neigbors and add additional column to response_df
try:
sim_vals = get_similar_mols(response_df[smi_column_name].values, model.lower())
sim_series = pd.Series(sim_vals).round(2).astype(str)
response_df['Tanimoto Similarity'] = sim_series.values
columns_dict['Tanimoto Similarity'] = { 'order': 3, 'description': 'similarity towards nearest neighbor in training data', 'isSmilesColumn': False }
except Exception as e:
app.logger.error('Error calculating similarity')
app.logger.error(f'error type: {type(e)}')
app.logger.error(e)
else:
try:
sim_vals = get_similar_mols(response_df[smi_column_name].values, model.lower())
sim_series = pd.Series(sim_vals).round(2).astype(str)
response_df['Tanimoto Similarity'] = sim_series.values
columns_dict['Tanimoto Similarity'] = { 'order': 7, 'description': 'similarity towards nearest neighbor in training data that was obtained by combining the compounds from all six individual datasets', 'isSmilesColumn': False }
except Exception as e:
app.logger.error('Error calculating similarity')
app.logger.error(f'error type: {type(e)}')
app.logger.error(e)
@DESKTOP-92JJ0KD:~/ncats-adme/server/models$ ls
rlm
(base) robel@DESKTOP-92JJ0KD:~/ncats-adme/server/models$
I decide to hold the task because
Name: Drug Combination (Graph Set) Generation
- Deep Generative Models for given Hierarchical Disease Network Embedding Source code: https://github.com/Shen-Lab/Drug-Combo-Generator/tree/master Publication: https://pubmed.ncbi.nlm.nih.gov/32657357/ Description: Drug-combo-generator is a deep generative model for drug combination design, by jointly embedding graph-structured domain knowledge and iteratively training a reinforcement learning-based chemical graph-set designer.
rdkit
, mpi4py
, networkx
, OpenAI baseline dependencies
, customized molecule gym environment
~$ conda create -c rdkit -n my-rdkit-env rdkit
~$ conda install mpi4py
~$ pip install networkx=1.11
#OpenAI baseline dependencies
~$ cd rl-baselines
~$ pip install -e .
#customized molecule gym environment
~$ cd gym-molecule
~$ pip install -e.
~$ python run_drug_comb_generator.py --disease_id=42
Name: DeepDTA: deep drug-target binding affinity Source code: https://github.com/XuanLin1991/DeepGS Publication: https://arxiv.org/abs/2003.13902 Description: Modeling of protein sequences and compound 1D representations (SMILES) with convolutional neural networks (CNNs) to predict the binding affinity value of drug-target pairs.
~$ conda create -n deepgs python=3.7.6
~$ source activate deepgs
- Clone repository and install requirements.
```console
~$ git clone https://github.com/jacklin18/DeepGS.git
~$ cd DeepGS
~$ pip install -r requirements.txt
Name: DrugChat: deep drug-target binding affinity Source code: https://github.com/ucsd-ai4h/drugchat Publication: https://www.techrxiv.org/articles/preprint/DrugChat_Towards_Enabling_ChatGPT-Like_Capabilities_on_Drug_Molecule_Graphs/22945922 Description: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
~$ git clone https://github.com/UCSD-AI4H/drugchat
~$ cd drugchat
~$ conda env create -f environment.yml
~$ conda activate drugchat
I also found some interesting additional models.
Name: toxAIcity Source code: https://github.com/subhasishgoswami/toxAIcity Description: A Long short-term memory based classifier to classify new drug candidates if toxic using Simplified molecular-input line-entry system notation.
Name: ChemCPA Source code: https://github.com/theislab/chemCPA Description: Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution
Name: Chagas detection Source code: https://github.com/csbl-br/chagas_detection Description: Chagas detection is a tool for the detection of T. cruzi trypomastigote forms in blood smear images. In particular, it provides a machine learning based method for the detection of parasites in images acquired using a mobile phone camera
Name: lepto-classifier Source code: https://github.com/sf-deng/lepto-classifier Description: A SVM-based binary classifier to detect leptospirosis diseases in dogs
Hello,
Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application