Anson-Chan-HK / DSCI-100-2023S1-4-PROJECT

DSCI 100 group 4 project (Summer 2023 semester 1)
MIT License
0 stars 0 forks source link

Group Project #2

Open Anson-Chan-HK opened 1 year ago

Anson-Chan-HK commented 1 year ago

Hi all,

I believe instead of doing everything all over again, we could duplicate a file from our proposal and rename the new file 'actual_project' and edit on that. Also I would list below some of the things we should add to our project, no one has to do anything until after our midterms, I am just writing so I do not forget what we should do:

1) Add visualisation that includes standardized variables of the training set 2) Modify our workflow since it is not working currently 3) Use arrange and slice to choose the new optimal K value 4) Graph accuracy vs K and the confusion matrix 5) a brand model specification with new K value 6) Perform classification for the full training set by creating new workflow 7) Prediction on test set (use the predict and bind_cols function) 8) Graph our results and the confusion matrix 9) Check the accuracy of our results 10) Conclude what the model tells us (e.g., correlation, which variable predict better), what it implies 11) Explain what we could have done better 12) Add written descriptions on what we are doing

*Remarks: We should also review our methods session on how to choose K (we should also look at other conditions like whether there are dramatic decreases in estimated accuracy for nearby values of K to maintain consistency)

Anson-Chan-HK commented 1 year ago

Also, I'll ask the TAs if the step of balancing is needed in steps of data preprocessing, and we should also cosult the TAs in their office hours soon.