ENEZA-DSI / mini-projects-2024

1 stars 6 forks source link

Project 1: Impact of Data Preprocessing on Machine Learning from a Diabetes Dataset #1

Open kipkurui opened 3 months ago

barrygenre commented 3 months ago

Reason to Undertake Project 1 (1st choice):

The Pima Indians Diabetes Dataset offers a rich opportunity to gain practical experience in handling real-world data imperfections, such as missing values and feature scaling issues. By engaging with this project, I will not only develop skills in data preprocessing—essential for any data scientist—but also enhance your understanding of the impact these techniques have on model performance. Specifically, the dataset's missing values and varying feature ranges present a realistic challenge that will allow you to experiment with and apply various data cleaning and normalization methods. Successfully preprocessing the data and training a binary classifier to predict diabetes outcomes will demonstrate my ability to improve model accuracy through thoughtful data preparation, a critical competency in the field of machine learning.

Sharonsang44 commented 3 months ago

I am Sharon Chepkemoi Sang from The Agha Khan University. I have a bacholor's Degree in Nursing . I'd like to take on this mini project as my first choice. I have a keen interest in data preprocessing techniques and their impact on machine learning models. Specifically, By working on this project, I hope to deepen my understanding of data preprocessing, improve my skills in handling missing data, learn various normalization and standardization techniques, and gain experience with k-fold cross-validation. These skills are crucial for ensuring the quality of data before it is used for analysis, which is a fundamental aspect of data science.

mainarose12 commented 3 months ago

My first choice for this project is because of my interest in data science, machine learning, and healthcare. This will provide a meaningful learning experience which will contribute to a better management of diabetes.

EstherNjuguna commented 3 months ago

The dataset provides a basis to put into practise the skills we have learned recently. The fact that real world datasets also contains a lot of missing values working on this dataset will be a great lead of learning how to impute data. Also seeing the models performance and working on improving it.

monchari2002 commented 3 months ago

I am Roselyter from AKU, This is my 3rd choice. Mainly to understand how to do data clean-up

Dorothy1800 commented 3 months ago

This is my First Choice Project.

I am Dorothy Chepkoech from Aga Khan University Institute for Human Development working as a Masters fellow in Data Science. My focus is to learn, explore and come up with innovative ways on utilizing data science techniques to develop early health interventions in Low and Middle Income countries. Data preprocessing is an important step for the success of ML projects. It enhances model accuracy, reduces overfitting, speeds up training, improves data quality, enhances interpretability, and facilitates feature engineering. Ignoring or inadequately performing data preprocessing can lead to poor model performance, inaccurate predictions, and unreliable insights. This project will enhance my skills in Data Prepocessing which is an in an important skill in my career as a data scientist, an area am currently pursuing in my masters.

Dan-Bern commented 3 months ago

Choice 1: Rationale- Daniel/Bernard AKU-SONAM Backg: Statistics With a keen interest to growing my skills as a statistician, one of the core knowledge areas i am interested in exploring is data cleaning/data preprocessing and predictive analytics. Working on this project will not only equip me with data cleaning skills but also enhance my predictive analytic skills

ninah20 commented 3 months ago

This is my 2nd choice. I chose this project because it will offer me a practical opportunity to address real-world data challenges. Handling missing values and normalizing data in the Pima Indians Diabetes Dataset will sharpen my skills in data preprocessing, essential for accurate machine learning model training and evaluation.

Dzoro5 commented 3 months ago

I am Edwin Dzoro Mwazuma, a research assistant at Aga Khan University. I have a background in Economics and Statistics. This is my first choice since I would like to understand data science skills, especially machine learning , and data cleaning is an important step. The project aligns perfectly with my interest in leveraging predictive models and machine learning to drive evidence-based decisions in health research.

OdongoIsaya commented 3 months ago

ISAYA ODONGO -PWANI UNIVERSITY-3RD CHOICE Machine learning is touted as the ultimate solution to human error or negligence in any research field. As a bioinformatics student involved in a project on host-pathogen interactions, I have always wondered if machine learning approaches can be applied in the field of metagenomics to quickly and efficiently characterize microbiome composition, diversity, and functional roles in host plant and animal systems, without relying heavily on traditional time-consuming pipeline development approaches. This project will be a learning opportunity for me

DOREENKDAVID commented 3 months ago

second choice I am Doreen Kinya, MSc Bioinformatics student at Pwani University. This project aligns well with my goals of ensuring data accuracy and consistency in genetic analysis. By demonstrating my ability to handle real-world data imperfections through imputation, normalization, and cross-validation these preprocessing techniques will enhance my ability to manage and analyze complex genetic data effectively.

yiasei commented 3 months ago

Second Option My name is Yiakon Sein, a MSc Bioinformatics student at Pwani University. This mini project will allow me to experience how to preprocess data, including cleaning, normalizing, and standardizing collected data. Furthermore, I will take advantage of Machine Learning techniques to predict the probability of an individual being diabetic or non-diabetic based on the processed data.

AnitaKer commented 3 months ago

My Second Choice

I am Anita Kerubo from Aga Khan University, with a background in statistics and research methodology. This project is significant as it will push me to think critically about how to process raw data and prepare it for subsequent analysis.

kipkurui commented 3 months ago

@Dzoro5 , which is you other choice? This one and #2 are full. Please comment on another project you'd be interested in