UBC-MDS / group29

Project Repo for Group 29 for DSCI 522
MIT License
0 stars 9 forks source link

Project Data Ideas #1

Closed jraza19 closed 3 years ago

jraza19 commented 3 years ago

Please add your ideas for dataset/project question here.

rachelywong commented 3 years ago

Here's an interesting dataset I found: https://data.world/informatics-edu/diabetes-prediction

Question: Given a person’s background, demographic, and current health status, can we determine their potential onset of diabetes?

Type: Predictive

Potential Data Analysis: -Machine Learning, predict either diabetes or no diabetes, unsure of which classification method to use yet (maybe RBF SVM?) -Could add different weights to more important factors (like blood sugar levels) -Randomizedsearch for hyperparameter optimization

Additional Information: -Importance: Diabetes prevalence is rising, especially right now with the economy and covid-19, more prevalent around low-middle class income groups -Can be prevented/delayed easily with early treatment/lifestyle changes

sukh2929 commented 3 years ago

I found the wine dataset interesting present here -> http://archive.ics.uci.edu/ml/index.php

wiwang commented 3 years ago

The data set is almost clean and its size should neither too large nor too small.

jraza19 commented 3 years ago

StackOverflow 2020 Survey Dataset: https://insights.stackoverflow.com/survey

Question: Do devOps specialists spend more years coding professionally than data scientists?

Type: Estimation

rachelywong commented 3 years ago

Summary from Zoom discussion: Go with Diabetes dataset