ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 123 forks source link

✍️ Contribution period: Ishita Pathak #1044

Closed IshitaPathak closed 3 months ago

IshitaPathak commented 3 months ago

Week 1 - Get to know the community

Week 2 - Get Familiar with Machine Learning for Chemistry

Week 3 - Validate a Model in the Wild

Week 4 - Prepare your final application

IshitaPathak commented 3 months ago

Motivation Letter

Hi, I am Ishita Pathak currently a first year student pursuing Master of Computer Application from Indira Gandhi Delhi Technical University For Women,Delhi, India. I am writing to express my genuine excitement about the opportunity to contribute to Ersilia's goals, to ensure that laboratories in less affluent countries have access to cutting-edge AI and ML tools for discovering drugs to treat infectious and neglected diseases.

As a computer science student, I have worked across various tech stacks. However, my current aspiration lies in delving deeper into AI/ML as ML is in my coursework too and Ersilia's project provides a chance to leverage my skills and knowledge to address real-world challenges. Being a quick learner, I'm ready to dedicate the time and effort needed to achieve these goals and learn new things along way.

Six years back, I went through a tough time when someone very close to me passed away because they couldn't get the medical help they needed in time. It really affected me and sparked a strong desire to make a difference in healthcare. I believe that contributing to Ersilia with my technical skills is the best way for me to do that. I am confident that I can contribute positively to advancing healthcare solutions and ultimately saving lives.

Why me? My passion for open source and never give up attitude sets me apart from others. I’ve always felt that working in open source and helping is my way of doing good for society but through this project, I’ll not only be able to give back to the community but also potentially save lives. I am excited about the opportunity to work on this project and will work as hard as I have to make this project a grand success.

Thanks and Regards Ishita Pathak

IshitaPathak commented 3 months ago

Week 1 TASK ✅

After Installation of Ersilia Model Hub I test it for simple model

ersilia -v fetch eos3b5e
ersilia serve eos3b5e
ersilia -v api run -i "CCCC"

Output

Screenshot 2024-03-12 213405

Testing Ersilia with Docker

docker pull ersiliaos/eos4wt0:latest
ersilia serve eos4wt0 
ersilia -v api run -i "CCCC"

Output

Screenshot 2024-03-22 154436



While completing the task I stuck at a point when I was testing ersillia model eos3b5e , where the container is always in exited status. I asked about this in Slack channel, where mentor helped me resolve the issue.

Screenshot 2024-03-12 210909



I truly appreciate the supportive environment within community, where both mentors and peers are always ready to lend a helping hand.

GemmaTuron commented 3 months ago

Hi @IshitaPathak Please update here w2 tasks that you have marked as done, so we can provide feedback

IshitaPathak commented 3 months ago

MY PROGRESS AND LEARNINGS

So far, I've learned valuable skills to contribute to Ersilia. It's been an exciting journey

I have a strong foundation in Python, but my exposure to libraries was somewhat limited. To address this, I've invested some time in learning some libraries GitHub repo here like Pandas and NumPy. By today, I aim to complete my understanding of Matplotlib and other libraries essential for my current task. Following this, I move forward with the next part of Week 2 tasks.

IshitaPathak commented 3 months ago

Week 2 TASK ✅

To identify which drugs might have this effect, Ersilia developed a computer-based model called deephERG. This model uses a type of artificial intelligence called deep neural networks to analyze large datasets containing information on thousands of chemicals. By studying the chemical structures and properties of these compounds, deephERG can predict their likelihood of blocking the hERG channel.

#

ersilia -v fetch eos30gr
ersilia serve eos30gr
ersilia -v api run -i "CCCC" 

Upon fetching the eos30gr model, I encountered consistent null output for the smiles prediction. Since the models are regularly updated, I tried the command ersilia -v fetch eos30gr --from_github to fetch the latest code from GitHub, which resolved the issue seamlessly.

Output

Screenshot 2024-03-22 140337

#

GemmaTuron commented 3 months ago

Hi @IshitaPathak

Thanks for the explanation. I suggest the following timeline:

As the application period is coming to an end and we want to ensure applicants have time to prepare strong applications please do not tackle Week 3 tasks and focus on the final application instead. Thanks!

IshitaPathak commented 3 months ago

Thankyou so much @GemmaTuron for the guidance and timeline. I'm committed to finishing the week 2 tasks and starting work on my final application right away.

IshitaPathak commented 3 months ago

After cleaning the data and obtaining corresponding InChIKey, I ran the model on the processed dataset using following commands

ersilia -v fetch eos30gr --from_github
ersilia serve eos30gr
ersilia -v api run -i processed_input.csv -o output.csv

The output generated by the model is saved in the file output.csv

histogram scatter plot

From the scatter plot we can say that due to significant overlap between the two classes, distinguishing between them becomes challenging. This overlap suggests that the features used for classification may not be distinct enough, impacting the model's ability to make accurate predictions and without a clear separation between the classes, the model may struggle to effectively differentiate between hERG blockers and non-blockers.

#

Completed week2 Task1 here is the link of notebook for this task 00_model_bias.ipynb

WEEK2 TASK2


* Then I compared the results of publication with those generated by the eos30gr model. The objective was to determine if both sources produce similar results.

<div style="display: flex; justify-content: space-around;">
    <div>
        <img src="https://github.com/ersilia-os/ersilia/assets/75848598/ae950928-eb9d-413e-a0be-f757c03dbac5" alt="LineChart_-vePredictiveProbability" width="358" />
        <img src="https://github.com/ersilia-os/ersilia/assets/75848598/42f76c5a-4ebb-4bfc-88a5-bb9e73f148a4" alt="BarChart_-vePredictiveProbability" width="402" />
    </div>
    <div>
        <img src="https://github.com/ersilia-os/ersilia/assets/75848598/d43a9847-c70a-4819-abf3-09b2e7bd6295" alt="LineChart_+vePredictiveProbability" width="358" />
        <img src="https://github.com/ersilia-os/ersilia/assets/75848598/cfd3d6e1-9ad5-46b0-aa13-17b49e832260" alt="BarChart_+vePredictiveProbability" width="402" />
    </div>
</div>

From the above graphs, it's very clear that there's a difference between the results obtained from the publication and those from the Ersilia Model Hub. This inconsistency suggests that the eos30gr model may not be reproducible.

Percentage of hERG Blockers and Non-Blockers in Publication Result:

| Blockers               | Number | Percentage |
|------------------------|--------|------------|
| Yes (Herg Blockers)    | 513    | 29.79%     |
| No (Non-Blockers)      | 1209   | 70.21%     |

Percentage of hERG Blockers and Non-Blockers After Testing from the Model:

| Blockers               | Number | Percentage |
|------------------------|--------|------------|
| Yes (Herg Blockers)    | 411    | 23.87%     |
| No (Non-Blockers)      | 1311   | 76.13%     |

From these percentages also, it's evident that there is a discrepancy between the percentage of hERG blockers and non-blockers in the publication results compared to those obtained from testing the model. This suggests potential issues with the reproducibility of the model. Hence model `eos30gr` is not reproducible.

Here is the link for [GitHub repository](https://github.com/IshitaPathak/model-validation-eos30gr/tree/master)
# 

## WEEK3 TASK
Selected a suitable dataset with sufficient experimental results, named [external_dataset_Xaio_Li.csv](https://github.com/IshitaPathak/model-validation-eos30gr/blob/master/data/external_dataset_Xiao_Li.csv) in data folder.

Here is the [reference of the data](https://weilab.math.msu.edu/DataLibrary/2D/#ref9) , I have taken Li 1092 test data
####
![Screenshot 2024-04-03 003723](https://github.com/IshitaPathak/model-validation-eos30gr/assets/75848598/84078cdd-2df6-47ba-9448-3b8e1d163952)
GemmaTuron commented 3 months ago

Hi @IshitaPathak

Thanks for the explanations, much celarer now, and good job on doing a PCA as well! Please move onto preparing your final application, many thanks!

IshitaPathak commented 3 months ago

Thankyou soo much @GemmaTuron. I really appreciate your time and feedback. Started working on final application.

IshitaPathak commented 3 months ago

WEEK 4 TASK ✅