Closed IshitaPathak closed 3 months ago
Motivation Letter
Hi, I am Ishita Pathak currently a first year student pursuing Master of Computer Application from Indira Gandhi Delhi Technical University For Women,Delhi, India. I am writing to express my genuine excitement about the opportunity to contribute to Ersilia's goals, to ensure that laboratories in less affluent countries have access to cutting-edge AI and ML tools for discovering drugs to treat infectious and neglected diseases.
As a computer science student, I have worked across various tech stacks. However, my current aspiration lies in delving deeper into AI/ML as ML is in my coursework too and Ersilia's project provides a chance to leverage my skills and knowledge to address real-world challenges. Being a quick learner, I'm ready to dedicate the time and effort needed to achieve these goals and learn new things along way.
Six years back, I went through a tough time when someone very close to me passed away because they couldn't get the medical help they needed in time. It really affected me and sparked a strong desire to make a difference in healthcare. I believe that contributing to Ersilia with my technical skills is the best way for me to do that. I am confident that I can contribute positively to advancing healthcare solutions and ultimately saving lives.
Why me? My passion for open source and never give up attitude sets me apart from others. I’ve always felt that working in open source and helping is my way of doing good for society but through this project, I’ll not only be able to give back to the community but also potentially save lives. I am excited about the opportunity to work on this project and will work as hard as I have to make this project a grand success.
Thanks and Regards Ishita Pathak
After Installation of Ersilia Model Hub I test it for simple model
ersilia -v fetch eos3b5e
ersilia serve eos3b5e
ersilia -v api run -i "CCCC"
docker pull ersiliaos/eos4wt0:latest
ersilia serve eos4wt0
ersilia -v api run -i "CCCC"
While completing the task I stuck at a point when I was testing ersillia model eos3b5e
, where the container is always in exited status. I asked about this in Slack channel, where mentor helped me resolve the issue.
I truly appreciate the supportive environment within community, where both mentors and peers are always ready to lend a helping hand.
Hi @IshitaPathak Please update here w2 tasks that you have marked as done, so we can provide feedback
So far, I've learned valuable skills to contribute to Ersilia. It's been an exciting journey
I have a strong foundation in Python, but my exposure to libraries was somewhat limited. To address this, I've invested some time in learning some libraries GitHub repo here like Pandas and NumPy. By today, I aim to complete my understanding of Matplotlib and other libraries essential for my current task. Following this, I move forward with the next part of Week 2 tasks.
As hERG channel is responsible for regulating the electrical signals in the heart. When certain drugs block this channel, it can cause a condition known as long QT syndrome, which can lead to dangerous heart rhythm abnormalities.
To identify which drugs might have this effect, Ersilia developed a computer-based model called deephERG. This model uses a type of artificial intelligence called deep neural networks to analyze large datasets containing information on thousands of chemicals. By studying the chemical structures and properties of these compounds, deephERG can predict their likelihood of blocking the hERG channel.
#
ersilia -v fetch eos30gr
ersilia serve eos30gr
ersilia -v api run -i "CCCC"
Upon fetching the eos30gr model, I encountered consistent null output for the smiles prediction. Since the models are regularly updated, I tried the command ersilia -v fetch eos30gr --from_github
to fetch the latest code from GitHub, which resolved the issue seamlessly.
#
Hi @IshitaPathak
Thanks for the explanation. I suggest the following timeline:
As the application period is coming to an end and we want to ensure applicants have time to prepare strong applications please do not tackle Week 3 tasks and focus on the final application instead. Thanks!
Thankyou so much @GemmaTuron for the guidance and timeline. I'm committed to finishing the week 2 tasks and starting work on my final application right away.
Selected list of 1000 molecules reference_library.csv
shared in Slack (data channel). To make sure the data was consistent, I standardized this SMILES representations using the function from src. For three SMILES, RDKit encounters invalid SMILES, resulting in NaN values. I removed those invalid entries from the dataset.
Next, I obtained the InChIKey representation for all the standardized SMILES. This information was used to create a DataFrame containing the processed SMILES and their corresponding InChIKeys. Now, this DataFrame had two columns: "smiles" and "InChI_key" I then saved this processed data as a csv file named processed_input.csv.
After cleaning the data and obtaining corresponding InChIKey, I ran the model on the processed dataset using following commands
ersilia -v fetch eos30gr --from_github
ersilia serve eos30gr
ersilia -v api run -i processed_input.csv -o output.csv
The output generated by the model is saved in the file output.csv
From the scatter plot we can say that due to significant overlap between the two classes, distinguishing between them becomes challenging. This overlap suggests that the features used for classification may not be distinct enough, impacting the model's ability to make accurate predictions and without a clear separation between the classes, the model may struggle to effectively differentiate between hERG blockers and non-blockers.
#
Completed week2 Task1 here is the link of notebook for this task 00_model_bias.ipynb
Selected Table6 from this repo provided in the publication on page no. 32 where author have taken 1,824 FDA approved small molecule drugs from DrugBank database. After standardising the smilies, removing null and duplicates values.
I ran the model on the dataset using following commands
ersilia -v fetch eos30gr
ersilia serve eos30gr
ersilia -v api run -i input_week2_task2.csv -o output_week2_task2.csv
* Then I compared the results of publication with those generated by the eos30gr model. The objective was to determine if both sources produce similar results.
<div style="display: flex; justify-content: space-around;">
<div>
<img src="https://github.com/ersilia-os/ersilia/assets/75848598/ae950928-eb9d-413e-a0be-f757c03dbac5" alt="LineChart_-vePredictiveProbability" width="358" />
<img src="https://github.com/ersilia-os/ersilia/assets/75848598/42f76c5a-4ebb-4bfc-88a5-bb9e73f148a4" alt="BarChart_-vePredictiveProbability" width="402" />
</div>
<div>
<img src="https://github.com/ersilia-os/ersilia/assets/75848598/d43a9847-c70a-4819-abf3-09b2e7bd6295" alt="LineChart_+vePredictiveProbability" width="358" />
<img src="https://github.com/ersilia-os/ersilia/assets/75848598/cfd3d6e1-9ad5-46b0-aa13-17b49e832260" alt="BarChart_+vePredictiveProbability" width="402" />
</div>
</div>
From the above graphs, it's very clear that there's a difference between the results obtained from the publication and those from the Ersilia Model Hub. This inconsistency suggests that the eos30gr model may not be reproducible.
Percentage of hERG Blockers and Non-Blockers in Publication Result:
| Blockers | Number | Percentage |
|------------------------|--------|------------|
| Yes (Herg Blockers) | 513 | 29.79% |
| No (Non-Blockers) | 1209 | 70.21% |
Percentage of hERG Blockers and Non-Blockers After Testing from the Model:
| Blockers | Number | Percentage |
|------------------------|--------|------------|
| Yes (Herg Blockers) | 411 | 23.87% |
| No (Non-Blockers) | 1311 | 76.13% |
From these percentages also, it's evident that there is a discrepancy between the percentage of hERG blockers and non-blockers in the publication results compared to those obtained from testing the model. This suggests potential issues with the reproducibility of the model. Hence model `eos30gr` is not reproducible.
Here is the link for [GitHub repository](https://github.com/IshitaPathak/model-validation-eos30gr/tree/master)
#
## WEEK3 TASK
Selected a suitable dataset with sufficient experimental results, named [external_dataset_Xaio_Li.csv](https://github.com/IshitaPathak/model-validation-eos30gr/blob/master/data/external_dataset_Xiao_Li.csv) in data folder.
Here is the [reference of the data](https://weilab.math.msu.edu/DataLibrary/2D/#ref9) , I have taken Li 1092 test data
####
![Screenshot 2024-04-03 003723](https://github.com/IshitaPathak/model-validation-eos30gr/assets/75848598/84078cdd-2df6-47ba-9448-3b8e1d163952)
Hi @IshitaPathak
Thanks for the explanations, much celarer now, and good job on doing a PCA as well! Please move onto preparing your final application, many thanks!
Thankyou soo much @GemmaTuron. I really appreciate your time and feedback. Started working on final application.
Week 1 - Get to know the community
Week 2 - Get Familiar with Machine Learning for Chemistry
Week 3 - Validate a Model in the Wild
Week 4 - Prepare your final application