drop2jyoti / Estimation-of-Obesity-Levels

0 stars 2 forks source link

Classification of Obesity Levels Based on Eating Habits and Physical Condition Using Data Analysis

Project Overview

Our aim in this project is to create a machine-learning model based on demographic features (such as age, gender, height, and weight) and lifestyle habits (e.g. eating patterns, exercise, smoking, and water intake) for predicting obesity levels. To do so we analyze the dataset titled "Estimation of Obesity Levels Based On Eating Habits and Physical Condition” (https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition) which contains 16 features and 2111 observations.

Potential Business Cases in Different Areas

The results of this project can assist various organizations in enhancing their decision-making processes. We have provided a concise summary of key organizations and their potential applications.

1. Public Health Organizations:

By determining which features in the dataset are the greatest predictors of obesity levels, public health professionals could craft educational campaigns focusing on the most impactful aspects. This would provide insight into what individuals should focus on to reduce their risk of obesity.

2. Health Care Providers and Practitioners:

The results could be used by health care professionals to monitor and manage obesity. More specifically it could be applied to help create a health recommendation system by leveraging the identified key variables related to lifestyle habits, dietary patterns, and physical conditions. Such a system could aid in identifying at risk individuals who could then be offered interventions and support.

3. Insurance Companies

The analysis can help in designing custom insurance policies or health premiums based on the identified obesity risks.

Analysis Goals

Libraries and Frameworks

This project will perform by using Python. For the library list please see the 'requirements.txt' file.

Methodology

In this project, our goal is to predict obesity levels based on various factors using a dataset from UCI. The target variable, NObeyesdad, represents obesity levels and includes 7 classes, making this a multi-class classification problem. The classes are as follows:

Below is the methodology we followed.

Dataset Information

This dataset is synthetic, and all classes are balanced, so class imbalance is not an issue. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, while 23% of the data was collected directly from users through a web platform.

1. Exploratory Data Analysis (EDA)

2. Data Cleaning

3. One-Hot Encoding for Categorical Variables

4. Machine Learning Modeling

5. Feature Engineering

6. Feature Elimination and Model Comparison

7. Findings and Conclusion

Workload Distribution

Group Members

Task Allocation

1. Exploratory Data Analysis (EDA)

2. Data Cleaning

3. One-Hot Encoding for Categorical Variables

4. Machine Learning Modeling

5. Feature Engineering

6. Feature Elimination and Model Comparison

7. Findings and Conclusion

Key Observations

Results

Future Scope and Next Steps

Team members

Arezoo khalili, Claire E, Jyoti Narang , Kathryn Vozoris, Zekiye Erdem