UBC-MDS / group29

Project Repo for Group 29 for DSCI 522
MIT License
0 stars 9 forks source link

Milestone #2 #28

Closed rachelywong closed 3 years ago

rachelywong commented 3 years ago

Milestone #2 Tasks:

rachelywong commented 3 years ago

From discussion in class today:

rachelywong commented 3 years ago

Machine Learning Plan:

  1. Split data into training and testing
  2. Label our features (categorical, numerical, binary)
  3. Create transformers for our features
  4. Create models{} to test out 2 models (BASELINE DUMMY and RBF SVM and LR) - 571 LAB 4 3.2
  5. use best. whatever to continue with our best model, based on f1 score? or mean cv score?
  6. Hyperparameter Optimization with randomized search with best model - 571 LAB 4 3.3
  7. Hyperparameter Optimization results - Confusion matrix , precision-recall curve, AUC? 573 LAB 1 2.7
  8. use best model and best hyperparameters on test set
  9. Use coeff to get top coefficients of best indicators 571 LAB 4 4.1
    • extra * find the test set with the most predictive readmission outcome vs not 571 LAB 4 5.2

Also, for any functions written we need documentation and sensible tests.

jraza19 commented 3 years ago

Machine Learning Plan:

  1. Split data into training and testing
  2. Label our features (categorical, numerical, binary)
  3. Create transformers for our features
  4. Create models{} to test out 2 models (BASELINE DUMMY and RBF SVM and LR) - 571 LAB 4 3.2
  5. use best. whatever to continue with our best model, based on f1 score? or mean cv score?
  6. Hyperparameter Optimization with randomized search with best model - 571 LAB 4 3.3
  7. Hyperparameter Optimization results - Confusion matrix , precision-recall curve, AUC? 573 LAB 1 2.7
  8. use best model and best hyperparameters on test set
  9. Use coeff to get top coefficients of best indicators 571 LAB 4 4.1
    • extra * find the test set with the most predictive readmission outcome vs not 571 LAB 4 5.2

Also, for any functions written we need documentation and sensible tests.

Thanks for this Rachel! I checked in with Varada regarding this work - specifically for correlated features. If we decide to go ahead with logistic regression, it will make the weights of one of the correlated features larger than the other which will make the interpretability of the coefficients in #9 difficult. But prediction will be okay. I can't remember if we can do this in RBF SVM either. I will double check.

rachelywong commented 3 years ago

As we discussed in lab:

Scripts and other docs:

Analysis plan:

  1. Split data into training and testing @rachelywong
  2. Label our features (categorical, numerical, binary) @rachelywong
  3. Create transformers for our features @rachelywong
  4. Create models{} to test out 2 models (BASELINE DUMMY and RBF SVM and LR) - 571 LAB 4 3.2 @rachelywong
  5. use best. whatever to continue with our best model, based on f1 score? or mean cv score? @rachelywong
  6. Hyperparameter Optimization with randomized search with best model - 571 LAB 4 3.3 @sukh2929
  7. Hyperparameter Optimization results - Confusion matrix , precision-recall curve, AUC? 573 LAB 1 2.7 @sukh2929
  8. use best model and best hyperparameters on test set @sukh2929
  9. Use coeff to get top coefficients of best indicators 571 LAB 4 4.1 @wiwang
  10. extra --> find the test set with the most predictive readmission outcome vs not 571 LAB 4 5.2 @wiwang
  11. store_results function --> write documentation and function tests @rachelywong --> maybe in the future make this its own script

Submission: @wiwang

rachelywong commented 3 years ago

@wiwang Please close this issue when we have all confirmed via Slack that we are good to go! and then create version 0.1.0 and submit both links to canvas (repo link and version link)! Thank you