Closed NaliH closed 8 months ago
Hello @NaliH Welcome to Ersilia. Be sure to complete the installation steps and run a model. The complete guide can be found here: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/installation and report here your progress. Thanks.
Hello @NaliH Welcome to Ersilia. Be sure to complete the installation steps and run a model. The complete guide can be found here: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/installation and report here your progress. Thanks.
Hello, @carcablop!
I have successfully installed Ersilia Model Hub and all prerequisites within WSL. The installation process went smoothly - I was able to successfully test the eos3b5e
model.
My name is Nalita Hinds, and I am a recent computer science graduate with a keen interest in AI/ML.
I decided to join Outreachy due to:
When I was first notified about being selected for the current contribution period, I saw that there was a long list of available projects – it was honestly a bit overwhelming to see them all! However, when I came across Ersilia, I immediately knew that I wanted to be a part of this project. This was solidified upon further reading of what Ersilia stands to achieve - supporting science, research, and medicine in low-income countries.
On top of that, I am highly interested in AI/ML and its implementations in healthcare and medicine. From surgery, to diagnostics, to drug discovery, to administrative workflows - there are so many applications for AI/ML! As multi-faceted as AI/ML is, I believe that there’s so much potential that still remains untapped in the field of medicine, and what current applications are there should be cultivated. I have aspirations to become a ML engineer with healthcare as a focus, and contributing to Ersilia is a huge step in that direction for me.
I am extremely excited to be able to learn more about Ersilia, AI/ML, and contribute to this project – to make a difference.
Thank you for the updates @NaliH, if you'd like you can get started with the tasks from week 2.
Hello @NaliH, You are yet to start on week 2 tasks. Is there any way we can support you?
Hello @NaliH, You are yet to start on week 2 tasks. Is there any way we can support you?
Hello, @HellenNamulinda I am currently in the process of completing the week 2 tasks. I will definitely reach out if I encounter any problems! I decided to work with the STOUT (SMILES to IUPAC) model and I'm setting up my environment now to be able to run predictions.
The model that I decided to work with is the STOUT (SMILES to IUPAC) model.
My reasons for choosing the STOUT model are:
While the STOUT repo details how to install it, I still ran into some issues with WSL and installing STOUT.
I encountered an issue where WSL would not start. Whenever I entered the wsl
command into my terminal, I'd get a messaged that my WSL instance had been terminated.
I attempted to:
wsl.exe
directlywsl --shutdown
To resolve this issue I had to (in order):
wsl --update
commandwsl --shutdown
commandwsl
commandThis updated and force restarted WSL, which was a simple fix!
I created a conda environment to install STOUT along with some other tools to run my predictions, such as Jupyter Lab. However, when attempting to use the conda command (conda install -c decimer stout-pypi
) to install STOUT, it would always fail.
Image of the error received (the exact conflicts varied based on what I tried):
I tried to get it to work via conda by:
Eventually, I used pip to install the STOUT package. It installed this way without any issues.
Hi @NaliH thank you for the updates. Let us know how it goes with trying out the STOUT model with the EML file, and then testing Ersilia's implementation on the same file. Let's know here if you get stuck and need help.
Upon first attempting to import STOUT into my notebook, I received an error regarding JVM.
After some troubleshooting, I realized that I needed to install Java to successfully run the model and its functions.
Tools used: • WSL • Conda environment • STOUT • Jupyter Lab • Pandas
I created a Jupyter Notebook for using STOUT with the EML data. The full notebook is available here.
In order to run the model on the EML data, I did the following:
translate_forward
vs retrieving from the DataFrameI had difficulty getting the Ersilia model to consistently perform predictions. With each method I used, I ended up receiving null
for various predictions.
Methods used to use the model:
Imports and functions used:
from ersilia import ErsiliaModel
ErsiliaModel("smiles2iupac")
model.serve()
model.run(input=SMILES_Data, output="json")
model.close()
I imported Ersilia into my program and was able to successfully fetch and serve the model. I initially ran the model on a single SMILES label and this ran without issue. I increased the amount of data to 5 labels and this also ran without issue. Seeing this, I attempted the full dataset.
However, when running the model on the full EML data (all 400+), and after 7 hours, I would receive null
as the output for all of the predictions.
Commands used:
ersilia fetch smiles2iupac
ersilia fetch smiles2iupac --from_github
ersilia serve smiles2iupac
ersilia run -i 'smile_data_here' -o 'save_file_here'
ersilia close
I attempted to fetch and serve Ersilia two ways - from GitHub and from Docker. I used the slug each time. I was able to do this successfully with both options. However, each had their own issues with running the model.
From Docker (pulled from container) As mentioned, the model ran inconsistently. Similarly to using Ersilia as a Python package, I initially ran it with a single SMILES label and increased the amount of data used as input. I was unable to get results with larger numbers of data input. Curiously, after some time, I was also receiving an issue on smaller amounts of 5 smiles labels.
From GitHub I received a connection refused error (Errno 111) when running the model. I attempted this method briefly before going back to using Docker.
Tools used:
I created a Jupyter Notebook for using Ersilia with the EML data. The notebook is available here.
I was able to successfully run the model on 10 SMILES labels (broken into chunks of 5) and compared the results to the STOUT model.
In order to run the model on the EML data for comparisons, I did the following:
Imported Pandas to read the data and create DataFrames to perform the translations and comparisons
Saved the EML data as a CSV file and imported it into my notebook with Pandas as a DataFrame
Fetched and served Ersilia
Created sets of data to be ran through the model
Ran the model on the sets of data
Created a DataFrame for comparisons and compared the prediction results
Created a function to retrieve the comparison results
Examples of comparing the predictions
From the amount of data compared (10 SMILES labels), the results were split down the middle - 5 results were similar and 5 were different.
For example, the SMILES label CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1
had the following results:
STOUT:
(E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide
Ersilia:
(E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide
The full comparison results, with the SMILES labels, IUPAC names, and comparison status is here.
Paper is available here.
What the model does The elEmBERT model is a neural network (NN) model that predicts chemical properties using structural information. The atomic position of chemical compounds are input and converted into tokens to be ran through the model for chemical analysis.
Why is it relevant to Ersilia The model is relevant to Ersilia because it can be used to predict chemical properties, which aids in drug discovery. The model can be adapted and applied to various datasets, providing it with flexibility. For instance, it's been benchmarked for datasets such as:
BBBP A dataset that contains annotated data for the ability of a chemical compound to penetrate the blood-brain barrier
Clintox A dataset that provides the toxicity profile of chemical compounds.
SIDER A dataset that contains structured information on drug-associated side effects
How would you implement it? The code is available on GitHub here. The model is written in Python and uses TensorFlow. Pre-trained dictionaries are available for the model. There are example notebooks provided on usage of the model. The datasets used for the benchmarks are also provided.
Paper is available here
What the model does The helmpy model uses a stochastic individual-based approach to forecast the transmission and control of helminth infections in infected humans. The model considers the control by mass-drug administration of soil-transmitted helminths, but it can be applied to other helminth species.
Why is it relevant to Ersilia According to WHO, soil-based helminths (such as hookworm) is one of the most common infections in the world. It's estimated that 1.5 billion, or 24% of the world's population have been affected by it. This infection is one of the many neglected tropical diseases (NTDs) that is common in low-income tropical/subtropical regions. As part of Ersilia's mission to support research of NTDs, this model can serve as a means to help combat helminth infections.
How would you implement it The code is available to be forked on GitHub here. The model provides interactive notebooks for implementation of the code, with documentation on how to use the notebook/model.
GitHub: Paper is available here Docs are available here
What the model does The PaddleHelix model uses deep neural networks (such as GNN) to search the chemical space for drug discovery. A scoring system is utilized on a molecule based on its properties such as bio-activity to a target protein, druggability, and synthetic accessibility. A generative method is then used on the score to predict similarities between the original and potential molecules.
Why is it relevant to Ersilia The model is relevant to Ersilia because it uses a ML approach to drug discovery, vaccine design, and precision medicine. Drug discovery often entails lab experiments that can be expensive and time consuming. According to WHO, average cost to develop a new drug ranges from US$43.4 million to US$4.2 billion. As one of Ersilia's goals is to facilitate drug discovery in low/middle income countries, incorporating a model to aid in the development process would be highly beneficial, as it would reduce some of the costs required in the drug discovery process.
How would you implement it The code is available on GitHub here. It is written primarily in Python, but requires some C++ for development. The model provides:
Hello,
Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application