ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
205 stars 132 forks source link

✍️ Contribution period: Iqra Akram #997

Closed Iqra350 closed 6 months ago

Iqra350 commented 6 months ago

Week 1 - Get to know the community

Week 2 - Get Familiar with Machine Learning for Chemistry

Week 3 - Validate a Model in the Wild

Week 4 - Prepare your final application

Iqra350 commented 6 months ago

As an AI engineer, I am elated by the prospects Ersilia presents in the realm of data science. Hailing from a region where opportunities to engage with machine learning projects are scarce, I've often felt constrained by the limited avenues available to showcase and apply predictive models. My initial role as a data analyst left me yearning for deeper involvement in this dynamic and ever-evolving field. The scarcity of opportunities often led to a sense of frustration, compounded by the scarcity of roles in this burgeoning area.

However, Ersilia stands as a beacon of opportunity, offering hands-on experience and a collaborative environment that fosters growth. Working alongside esteemed scientists and fellow enthusiasts reignites my passion for data science and reinforces my programming proficiency in handling vast datasets.

Throughout my academic journey in computer science, I gravitated towards the data science domain, delving into research papers and expanding my expertise in machine learning. Despite the well-documented gender disparities in AI/ML professions, where only 26% of professionals are women, I remain resolute in my determination to excel and contribute meaningfully. Adapting to challenges faced by women in the field, I've explored diverse roles, including frontend engineering. Even after relocating, navigating through job opportunities presented its own set of hurdles.

The ethos and objectives of Ersilia resonate deeply with me. The commitment to democratize AI/ML models and ensure inclusivity in accessing advancements aligns with my personal values. Being part of such a visionary environment promises not only to make a meaningful impact but also to propel me towards new heights of influence.

I am profoundly grateful for the opportunity to collaborate with esteemed scientists and benefit from the mentorship of experienced PhD professionals. This nurturing environment not only strengthens my foundational knowledge but also fosters a culture of compassion that enriches the learning experience.

In conclusion, the Ersilia Model Hub internship embodies a pivotal opportunity for me to fuel my passion for data science and contribute to transformative projects that hold the potential to shape the future.

Iqra350 commented 6 months ago

Screenshot from 2024-03-06 12-51-43 Screenshot from 2024-03-06 12-52-48

Iqra350 commented 6 months ago

Have not found any issue till now.

Ajoke23 commented 6 months ago

Have not found any issue till now.

Well-done, you've done a great work so far. You have completed the week 1 task. I might just suggest you going through Ersilia model hub (https://www.ersilia.io/model-hub) to get familarize with the model since we are waiting on the team to finalize week 2 and week 3 task.

Iqra350 commented 6 months ago

Have not found any issue till now.

Well-done, you've done a great work so far. You have completed the week 1 task. I might just suggest you going through Ersilia model hub (https://www.ersilia.io/model-hub) to get familarize with the model since we are waiting on the team to finalize week 2 and week 3 task.

yeah sure i am working on it. :)

Ajoke23 commented 6 months ago

Hi @Iqra350. I noticed you're yet to start week 2 tasks. Are you facing any challenges? If yes, you can state them here, I will be glad to put you through

DhanshreeA commented 6 months ago

Hi @Iqra350 please let us know if you want to continue your work during this contribution period. Otherwise we can close this issue and focus on other applicants.

Iqra350 commented 6 months ago

@DhanshreeA Sorry for the inconvenience. I will update all task for week 2 and 3 with in this week.

DhanshreeA commented 6 months ago

Hi @Iqra350 I do not see any updates yet. I am closing this issue because it will be too late to catch up with the other applicants at this point. Thank you for your time and efforts.

Iqra350 commented 6 months ago

I am working on notebook will upload all task all together.

Iqra350 commented 6 months ago

Screenshot 2024-03-22 155008 This is the proof.

Iqra350 commented 6 months ago

Sorry for the late response i thought i have to upload all the tasks at once.

I have Uploaded the project for Week 2 and for Week 3 will upload in one or two days. Before the deadline.

You can Access my progress in link

Iqra350 commented 6 months ago

Task Complete Week 2 Task 1 Task 1 Task Completed Week 3 Task 3 Task 3

Task 2 of Week 2 is also completed but getting some issue of loading the data to orignal model of github Screenshot 2024-03-23 150014

I am working on it to resolve the Issue

Iqra350 commented 6 months ago

README for Machine Learning Model Validation Project

This repository contains the code and documentation for validating the "eos74bo" model from the Ersilia Model Hub, focusing on Tasks of the internship project, which involves checking model bias.

Project Overview

The goal of this project is to validate the accuracy and reproducibility of the "eos74bo" model, which predicts ADME properties of small molecules. Task 1 specifically involves checking for model bias by running predictions for a list of 1000 diverse molecules and plotting the results in a scatter plot. and Task 2 reproducibity of results and Task 3 is extrnal data validation from Selected paper

About Selected model

Selected model

Kinetic aqueous solubility (μg/mL) was experimentally determined using the same SOP in over 200 NCATS drug discovery projects. A final dataset of 11780 non-redundant molecules and their associated solubility was used to train a SVM classifier. Approximately half of the dataset has poor solubility (< 10 μg/mL), and two-thirds of these low soluble molecules report values of < 1 μg/mL. A subset of the data used is available at PubChem (AID 1645848) https://pubchem.ncbi.nlm.nih.gov/bioassay/1645848#section=Result-Definitions.

Characteristics

Repository Structure

Getting Started

Task 1 - Week 2

Evaluating the biased

Task 2 - Week 2

Reproducibility of Results

Select the paper and get the dataset that used in that paper and make prediction on that.

The Paper is followed:

https://slas-discovery.org/action/showPdf?pii=S2472-5552%2822%2906765-X that used the Solubilty dataset:

About the Code:

In this notebook, I am loading a list of molecules I obtained from PubChem for solubilty check of drug, and will replicate the results for Ersilia Hub Model mentioned in link and for the git hub code of external sourse mentioned in linkprocessing them to make sure I have:

OutCome:

Task 3 - Week 3

PAMPA, parallel artificial membrane permeability assay: PAMPA is a laboratory test used in drug development to assess how easily a drug can pass through cell membranes. It helps researchers understand the ability of a drug to be absorbed into the bloodstream, which is important for determining its effectiveness.

Download the External dataset:

the code to download the dataset is below

from tdc.single_pred import ADME
data = ADME(name = 'PAMPA_NCATS')
split = data.get_split()

The github Link for the entire code is Notebooks for All Weeks