UBC-MDS / wine_quality_predictor_group1

MIT License
0 stars 0 forks source link

Task Assignments - Milestone 1 #1

Closed SimplyTim closed 2 days ago

SimplyTim commented 6 days ago

Hey all.

Based on our meeting today, I just wanted to finalize the distribution of work for this week:

For the analysis, the requirements are documented here.

Just to summarize our discussion today, the flow of this analysis.ipynb document will include both the explanation and code for the following steps:

  1. Introduction of the topic, dataset, and question
  2. Loading of the dataset
  3. Data wrangling and/or data cleaning of the dataset (if necessary).
  4. Perform EDA which summarizes the dataset, and create visualizations to capture any potential correlation of the features in the data.
  5. Split data into training and testing sets.
  6. Create a column transformer that does any necessary preprocessing for the data.
  7. Create pipelines for each of Decision Tree, KNN, SVC and Logistic Regression, including the preprocessors. Perform cross-validation to find the model that performs the best, based on a certain metric (to be discussed).
  8. Once a model type is selected, perform hyperparameter optimization to find the best hyperparameters for this model.
  9. Run the model on the testing set, and visualize results as necessary.

This is a very high level overview of the tasks. If you believe I forgot or said something incorrectly, just let me know! Also let me know which tasks will be done by whom. For instance, I wouldn't mind doing the coding for parts 6 to 8 from the list above.

Thanks! 😄

EDIT: I forgot to mention we would have to do separate branches for each task or subset of tasks. So for instance, the EDA might be one branch, the model selection might be another, the hyperparameter optimization might be another, etc.

yixuangaoclara commented 4 days ago

For data analysis part: @yixuangaoclara Introduction, import data, and clean data(step 1 ~ 3). Might do some parts of EDA(step 4) due to data cleaning. @BryanLee06 Perform EDA, split data and create transformer(step 4 ~ 6). @SimplyTim Create pipelines for Models and perform cv(step 7). @wzhu8410 Find the best hyperparameters, run the model on the testing set, and visualize the results(step 8 ~ 9).

yixuangaoclara commented 2 days ago

I need to change the summary part of the notebook and the conda lock part of the README file once we finish the other parts.

SimplyTim commented 2 days ago

Updated the table titles as well as minor edits to discussion. @BryanLee06 @yixuangaoclara @wzhu8410 any final changes?

yixuangaoclara commented 2 days ago

submitted our Milestone 1 PDF on Gradescope

SimplyTim commented 2 days ago

Thanks @yixuangaoclara. I'm closing this issue for Milestone 1. Great work guys!