UBC-MDS / DSCI_522_Group304

DSCI 522 Group 304 Project - Are There Differences in FSA Scores Between Subgroups?
MIT License
0 stars 5 forks source link

Possible Datasets #8

Closed annychih closed 4 years ago

annychih commented 4 years ago

We'll need to pick a dataset to work with, so this thread is to add possible datasets for discussion during our team meeting.

annychih commented 4 years ago

Dataset: BC Surgical Wait Times Licence: Open Government Licence - BC ("free to copy, modify, publish, translate, adapt, distribute, or otherwise use the Information in any medium, mode or format for any lawful purpose.") but must acknowledge the source and provide atribution Source: Ministry of Health - Medical Services https://catalogue.data.gov.bc.ca/dataset/bc-surgical-wait-times

Potential Research Questions

  1. Inferential: Are wait times for certain surgeries longer at facilities outside of Vancouver vs within city limits?
  2. Inferential: Is there a significant difference in wait times for different hospitals or different procedures?
    • Note: We have about 9 years worth of data x 4 quarters each = 36 possible data points for a given procedure at a hospital. However, the data is by quarter. Can we answer this question using the data we have?
  3. Predictive: What is the expected wait time for a procedure at a certain hospital based on median wait times within the same quarter in previous years? The province currently offers wait times based on the past 3 months, but if Christmas holidays are coming up and doctors are taking time off, an estimate based on July-Sep may not be as good as an estimate based on Oct-Dec over the past 9 years.
annychih commented 4 years ago

Dataset: BC Schools - Foundational Skills Assessment (FSA) Licence: Open Government Licence - BC Source: Ministry of Education - Education Analytics https://catalogue.data.gov.bc.ca/dataset/bc-schools-foundation-skills-assessment-fsa-

Potential Research Questions

  1. Inferential: How do the scores for certain exams compare between different sub populations (ex. male / female, aboriginal / non-aboriginal, public / private school)? Ex. Do students in private school perform better than students in publicly funded schools on numeracy exams?

    • Null Hypothesis: there's no difference in scores for the different groups
    • Alternative Hypothesis: there is a difference in scores for the different groups
  2. Predictive: What score on a specific exam type (ex. numeracy) will a child in a specific sub population (ex. aboriginal) get within a BC publicly funded school based on their:

    • school district,
    • gender,
    • how many special needs / non-English language learner are in the school (assuming here that this may influence scores because more funding may be used towards these other groups than to programs that support the aboriginal population specifically),
    • scores on other exams, and
    • number of exam writers from the sub population (i.e. number of students within the class that fall within this sub population group)?
  3. Predictive: Will the score from Grade 4 influence how they'll perform in Grade 7?

zouwenjiao commented 4 years ago

Dataset: Abalone Source: UCI Machine Learning Repository https://www.kaggle.com/rodolfomendes/abalone-dataset

Potential Research Questions

  1. Inferential: How do the average physical features(eg. length, diameter, height, weight) differ between different genders? Ex. Is male abalones' average length significantly different from female's?

  2. Predictive: How many rings do a specific abalone have given a set of features:

    • Sex
    • Length
    • Diameter
    • Height
    • Whole weight
    • Shucked weight
    • Viscera weight
    • Shell weight

    (May need feature selection to improve model performance)

zouwenjiao commented 4 years ago

Dataset: Where it pays to attend college Licence: Other (specified in description) Source: Kaggle https://www.kaggle.com/wsj/college-salaries

Potential Research Questions

  1. Inferential: How do the average salary differs among different school types? (Ex. Does a graduate from an engineering university get significantly different salary from a graduate from a state university?)

  2. Inferential: How do the average salary differs among different regions? (Ex. Does a graduate from a college in LA get significantly different salary from a graduate from a university in NY?)

  3. Predictive: How much salary can a person get given a set of features:

    • Degree
    • Region
    • Major
robilizando commented 4 years ago

Dataset: Mining, Quarrying, & Oil & Gas Extraction Source: DATA USA https://datausa.io/profile/naics/mining-quarrying-oil-gas-extraction#io

robilizando commented 4 years ago

Dataset: Data Scientist Job Market in the U.S. Source: Kaggle https://www.kaggle.com/sl6149/data-scientist-job-market-in-the-us

Know avg salary by year of experience, specific job, etc

annychih commented 4 years ago

Closing this issue because we've decided on a dataset (see Project Proposal issue for more detail): FSA Dataset