Closed Zainab-ik closed 1 year ago
Systems Specification
Hi @Zainab-ik
Welcome to Ersilia, great to have you here. Please let us know which system are you using, and whenever possible, refrain from pasting screenshots as they are more difficult to review by mentors! Thanks
Hi @Zainab-ik
Welcome to Ersilia, great to have you here. Please let us know which system are you using, and whenever possible, refrain from pasting screenshots as they are more difficult to review by mentors! Thanks
Thank you @GemmaTuron
it's great to be here also. All comments noted. I'd update the last comment.
Hi everyone, my name is Zainab and I was a graduate intern at the Nigeria Institute of Medical Research. Also, I had a BSc. in Pharmacology, Therapeutics, and Toxicology. I'm interested in Ersilia because it's at the intersection of Machine Learning and drug discovery. I came across Ersilia while browsing through the Outreachy available project.
What drew me to Ersilia was the intersection of Artificial Intelligence (AI) and my background study, Pharmacology. I developed a passion for Computational Pharmacology and had my thesis along the line of predicting the antiviral properties of a natural plant using computational tools. I also belong to CaresAI, a cancer drug discovery research group with AI methodology as a focus point.
I also resonate with Ersilia's mission of tailoring their models to researchers from the LMIC (Low-Middle Income Countries) which thereby improves the quality of their research and outcome. I am a lover of open science and Ersilia is just the right open-science project for me looking at my background and my skill set.
Ersilia projects span malaria, and infectious diseases and these are my core research interest. My skill set includes; ML, Python, Linux, bioinformatics tools, etc. Contributing to Ersilia would be a way of giving back to the community and that's what open source is about.
Learning while giving back
I intend to take on more AI in drug discovery projects and working with Ersilia will improve my skills and give me confidence. Furthering my studies in Computational Pharmacology is a career progression that I believe Ersilia is going to be a great propeller in achieving.
Thank you for reading my motivation letter, I'm excited to contribute to Ersilia while also collaborating with my peers to make science accessible to all.
Hi, @GemmaTuron, Week 1 contribution completed. Kindly review. Thanks.
Thanks @Zainab-ik ,
Welcome to the contribution period!
Thanks @Zainab-ik ,
Welcome to the contribution period!
Thanks, @GemmaTuron.
Model Selected - SARS-CoV2 activity (Image Mol)
The Image Mol model takes in molecular images as the dataset. Having worked in the molecular lab, Seeing results generated as images being used in training a model would be interesting and therefore choose this model. I'm interested in its application for running unlabeled images of drug-like molecules since data annotation for compounds is tedious. The therapeutic effect of a drug can be evaluated by different means with the molecular properties, drug synthesis and low toxicity level being important factors of the therapeutics determination. The Image Mol model can also enhance the drug discovery's molecular docking process since the binding can be viewed. Lastly, having worked with SARS-CoV-2 molecules previously, using the molecular images of its target protein with the corresponding compound of interest, this is a good model to cement that knowledge. I also get to learn about computer vision. You can read more about the model here Thank you.
@GemmaTuron
This is a detailed step of how I installed the model on my system
This process is done on the Linux machine (WSL - Ubuntu)
sudo apt install cuda-10-1
after setting up the package repository.
Error message ;
E: unable to locate package cuda-10-1
I then used the command sudo apt install nvidia-cuda-toolkit
and successfully installed CUDA GPU required environment.
conda create -n imagemol python=3.7.3
and activate it using conda activate imagemol
conda install -c rdkit rdkit
pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.4.0%2Bcu101.html
. However, I got an error while running this. Collecting torch-cluster Downloading torch_cluster-1.6.0.tar.gz (43 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.4/43.4 kB 218.2 kB/s eta 0:00:00 Preparing metadata (setup.py) ... error error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File "
", line 36, in File " ", line 34, in File "/tmp/pip-install-6k22__7f/torch-cluster_4930bb5f17ce4e118a9da372c9ef78c0/setup.py", line 8, in import torch ModuleNotFoundError: No module named 'torch' [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed
× Encountered error while generating package metadata. ╰─> See above for output.
note: This is an issue with the package mentioned above, not pip. hint: See above for details.
From the ModuleNotFoundError: No module named 'torch', I installed pytorch manually using conda install pytorch-cpu torchvision-cpu -c pytorch
However, I ran into another error showing,
Using cached torch_cluster-1.6.0.tar.gz (43 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File "
", line 36, in File " ", line 34, in File "/tmp/pip-install-m6s5wd8n/torch-cluster_c5c4961158204cfc86d0996e50312235/setup.py", line 10, in from torch.config import parallel_info ImportError: cannot import name 'parallel_info' from 'torch.config' (/home/zainab_ik/miniconda3/envs/imagemol/lib/python3.7/site-packages/torch/config.py) [end of output] I'm currently working on resolving the error.
@GemmaTuron @DhanshreeA
Hi @Zainab-ik !
This model has two versions: one for TRAINING the model, and one for simply running predictions on pre trained models (FInetuning section on the readme) the training requires CUDA-GPU, but the finetuning does not. Do not install the CUDA - GPU environment because it won't work in most systems, follow the instructions to run predictions only. I hope this helps! I think Ahmed was working on this model as well!
Hi @Zainab-ik !
This model has two versions: one for TRAINING the model, and one for simply running predictions on pre trained models (FInetuning section on the readme) the training requires CUDA-GPU, but the finetuning does not. Do not install the CUDA - GPU environment because it won't work in most systems, follow the instructions to run predictions only. I hope this helps! I think Ahmed was working on this model as well!
Thank you @GemmaTuron, I'd make an update.
As stated above, I ran into a couple of errors while installing the Torch packages.
To download the other Pytorch packages such as the torch-cluster, torch-scatter, torch-sparse, torch-spline-conv
I encountered an error showing;
ERROR: Failed building wheel for torch-sparse
ERROR: Failed building wheel for torch-cluster
ERROR: Failed building wheel for torch-scatter
ERROR: Failed building wheel for torch-spline-conv
while trying to install the packages. This occurred due to package version clashes. From my understanding, I noticed the PyTorch packages are sensitive to the package version. To download a certain package, you might need to downgrade or upgrade a certain package it depends on.
How to resolve the Error
Since the packages are PyTorch Geometric (PyG) packages, I read the installation guide from the PyG website here.
I installed the packages from the source directly via conda
using conda install pyg -c pyg
since pip install
keep throwing this error,
(imagemol) zainab_ik@DESKTOP-E8NO3DG:~$ pip install torch-spline-conv
Collecting torch-spline-conv
Using cached torch_spline_conv-1.2.1.tar.gz (13 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 36, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-vgddfgna/torch-spline-conv_6514989605ba443d8a0107c604ef6a92/setup.py", line 8, in <module>
from torch.utils.cpp_extension import BuildExtension
ModuleNotFoundError: No module named 'torch.utils.cpp_extension'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
and I was able to install the packages together.
I ran the required installation code again pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.4.0%2Bcu101.html
and my output is;
Requirement already satisfied: torch-cluster in ./miniconda3/envs/imagemol/lib/python3.7/site-packages (1.6.0)
Requirement already satisfied: torch-scatter in ./miniconda3/envs/imagemol/lib/python3.7/site-packages (2.1.0)
Requirement already satisfied: torch-sparse in ./miniconda3/envs/imagemol/lib/python3.7/site-packages (0.6.16)
Requirement already satisfied: torch-spline-conv in ./miniconda3/envs/imagemol/lib/python3.7/site-packages (1.2.0)
Requirement already satisfied: scipy in ./miniconda3/envs/imagemol/lib/python3.7/site-packages (from torch-sparse) (1.7.3)
Requirement already satisfied: numpy<1.23.0,>=1.16.5 in ./miniconda3/envs/imagemol/lib/python3.7/site-packages (from scipy->torch-sparse) (1.21.5)
I confirmed the packages using the grep torch
and my output is;
Finally, I was able to download all the PyTorch packages.
conda activate imagemol
git clone https://github.com/HongxinXiang/ImageMol.git
pip install -r requirements.txt
And finally, the model is installed.
@GemmaTuron, I'd love to ask if installation via either conda
or pip
doesn't matter since the installation process of the model specified pip
.
Hi @Zainab-ik !
This model has two versions: one for TRAINING the model, and one for simply running predictions on pre trained models (FInetuning section on the readme) the training requires CUDA-GPU, but the finetuning does not. Do not install the CUDA - GPU environment because it won't work in most systems, follow the instructions to run predictions only. I hope this helps! I think Ahmed was working on this model as well!
currently working on the running prediction models as stated.
Running Predictions of the SARS-CoV2 activity with the ImageMol The SARS-Cov2 dataset comprises of different activity datasets namely ;
Since the Model has been installed and the environment activated. You can finetune the model by running;
python finetune.py --gpu ${gpu_no} \
--save_finetune_ckpt ${save_finetune_ckpt} \
--log_dir ${log_dir} \
--dataroot ${dataroot} \
--dataset ${dataset} \
--task_type ${task_type} \
--resume ${resume} \
--image_aug \
--lr ${lr} \
--batch ${batch} \
--epochs ${epoch}
You edit the code to fit the specific dataset you want to use.
python finetune.py --gpu 0 \
--save_finetune_ckpt 1 \
--log_dir ./logs/toxcast \
--dataroot ./datasets/finetuning/SARS-CoV-2 \
--dataset ACE2_enzymatic_activity \
--task_type classification \
--resume ./ckpts/ImageMol.pth.tar \
--image_aug \
--lr 0.5 \
--batch 64 \
--epochs 20
However, I encountered several errors ranging from;
finetune.py : command not found
, --image-aug: command not found
, lr: command not found
python: can't open file 'finetune.py': No such file or directory
Assertion error: a particular path is not a directory
Trial:
For every command not found
error, I tried removing them and it throws back the same error for another argument.
For the assertion error,
I copied the path from my local system and modified it and it still throws back the same error.
I also worked with the original SARS-CoV-2 assay that came with the repository which is the 3CL_enzymatic_activity.
and encountered the assertion error.
I'm starting the installation and cloning again and trying the finetuning.
Update
After re-installation, I figured the error was due to the path. I followed the instruction again to push the pre-trained model into the ckpts\
path and put my downloaded dataset into the finetuning\
path rather than the toy\
path it was placed in the repository.
I finetuned the SARS-CoV-2 model on this ACE2_enzymatic_activity dataset again.
Result The result of Finetuning on the ACE2_enzymatic_activity is
final results: highest_valid: 0.742, final_train: 0.461, final_test: 0.361
Model evaluation This was done by running the code
python evaluate.py --dataroot ./datasets/finetuning/SARS-CoV-2 \
--dataset ACE2_enzymatic_activity \
--task_type classification \
--resume ./ckpts/ImageMol.pth.tar \
--batch 128
Error encountered
While running this, I encountered a RuntimeError: Error(s) in loading state_dict for ResNet: Unexpected key(s) in state_dict
which I figured was due to the version and dependency clash. I edited the evaluate.py file to overlook the version difference by adding ['state_dict'], strict=False)
to the model_load
code. I then evaluated it again.
Result
[test] rocauc: 41.4%
This is a very low value which indicates a poor model performance,
Kindly review@DhanshreeA @GemmaTuron.
Hi @Zainab-ik
Great, I was going to point to the path being maybe incorrect, good catch! Thanks for trying the finetuning of the dataset, fantastic work. Since the ACE2 is not implemented in the Hub, can you simply try to run predictions with the model you fine tuned for a few molecules and with that, tick off the taskst from week 2 and focus on week 3?
Good job!
Thanks @GemmaTuron, I'd work on that.
Running prediction is the same as evaluating the models with few molecules for the ImageMol Model. The Model takes in a smiles list with Index numbers and labels (0,1) as input accompanied by the corresponding Image since it's an Image Model.
While working with the Essential Medicine List (eml_canonical.csv)
, I had to download the corresponding images for a few smile lists to run predictions. However, I encountered a
ValueError: Only one class is present in y_true. ROC AUC score is not defined in that case. This is due to some reasons;
- The default batch size for prediction is 128, and I was running just five molecules
- The Model does not process the corresponding images
- I had to replicate the molecules into a folder just like the rest of the bioassays because it was throwing a
directory error.
I tried the following to bypass the errors and make predictions on a few molecules.
default batch size
in the evaluate.py
script from 128 to a lesser value to enable prediction on a few molecules. The default batch size of 128 disables predictions on molecules less than 128. Using the above code for running prediction. This was done on 128 molecules while debugging.
On 50 and lesser molecules while tuning the hyper-parameters like the batch size
@GemmaTuron Kindly review. Also, while going through the Ersilia Model Hub, the model eos4cxk
has a similar implementation to SARS-CoV-2 ImageMol Model.
Hi @Zainab-ik !
Good job on working that model out! With that I think you can tick off all week 2 tasks as completed (since the model is not yet implemented in the EMH for comparison) and we can move onto week 3 tasks, looking forward to seeing your model suggestions!
Hi @GemmaTuron, Thank you very much. I'm looking forward to reviewing literature and suggesting models.
hERG blocker: Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models
This model integrated molecular embedding with deep neural network algorithms and gradient boosting tree to predict potential hERG blockers. The blockage of the hERG channel causes cardiotoxicity and leads to the withdrawal of drugs from clinical trials. Therefore, It's important for a therapeutic compound to be screened for cardiotoxicity in the drug development pipeline. This model was trained using 8641 compounds, mostly either FDA-approved, experimental, or investigational drugs, from the DrugBank database. This model was further tested on large datasets and delivered an accuracy of 0.981
. The molecular feature was generated using transformer NPL techniques.
hERG blocker
compound - smiles format.
classification
Toxicity, hERG, cardiotoxicity,
Score(0,1), Probability
The model contained within this package is licensed under an MIT license.
DeepDrugCoder (DDC): Heteroencoder for molecular encoding and de novo generation
Deep-Drug-coder is a generative model that employs the neural network approach by constructing a conditional recurrent neural network (cRNN) to generate active molecules with specified properties for a certain condition. The model can generate huge datasets of novel molecules for further assessment, such as ADME, Toxicity, etc. The model aggregates selected molecular descriptors and a bioactivity label (0,1) and generated SMILES strings focused on the targeted properties. Featurization was performed using molvecgen.
DeepDrugCoder (DDC)
compound
Generative
molecule generator, neural network
compound
The model contained within this package is licensed under an MIT license.
DRKG model - Drug Repurposing Knowledge Graph for Covid-19
The DRKG is a comprehensive knowledge graph that connects different entities, such as genes, compounds, diseases, biological processes, side effects, and symptoms, together in entity-pair. It uses the Knowledge Graph Embedding (KGE) machine learning methodology for evaluation and analysis. The main focus of the comprehensive interaction from this knowledge graph is the Compound-disease interaction for Covid-19 drug repurposing. The pretrained DRKG model for drug repurposing for COVID-19 predicts whether existing drugs successfully inhibit certain pathways related to Covid-19 host proteins using the KGE models.
DRKG
compound
classification
knowledge graph
, drug repurposing
, drug-target interaction
, DTI.
score
The model contained within this package is licensed under an Apache License, Version 2.0
Hi @Zainab-ik !
Good model suggestion to start with! Can you add it to our list? Looking forward to the next models!
Hi @Zainab-ik !
Good model suggestion to start with! Can you add it to our list? Looking forward to the next models!
Thank you very much @GemmaTuron. I've updated the list with the model and added one more model suggestion.
Hi @GemmaTuron!
I've completed the model suggestion. Kindly review. Also, your mentorship is really appreciated. Thank you for the exposure to the AI world of drug discovery. It's a career path I want to pursue as a Research scientist, which is very insightful.
Hi @Zainab-ik !
Thanks for the model suggestions! The generative model is interesting but would be probably difficult to incorporate as is to the Hub, so we'll leave it for the moment! Can you add the COVID19 model to the list? Thanks!
Hi @GemmaTuron,
Thank you. I've added it to the list.
@Zainab-ik
Would you be able to tackle this issue to test the model just incorporated by @emmakodes? Thanks!
@Zainab-ik
Would you be able to tackle this issue to test the model just incorporated by @emmakodes? Thanks!
I'd do that. @GemmaTuron, This issue has been completed.
Hi @GemmaTuron!
I have submitted my final application. I look forward to making more meaningful contributions to this field. It's been a wonderful experience and your mentorship is really appreciated for the successful contribution. Many thanks.
I'd be closing this issue now.
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application