UBC-MDS / DSCI_522_Group19_Wine_Quality_Score_Predictor

Wine Quality Score Predictor is our data analysis project for the 2021-22 UBC MDS DSCI 522 course
Other
0 stars 7 forks source link

milestone2 to-do list - model fitting #19

Closed Kingslin0810 closed 2 years ago

Kingslin0810 commented 2 years ago

model fitting (Zack/Pavel)

zackt113 commented 2 years ago

Hi, just saw Florencia's post regarding the model fitting. She said using one model is fine. Should we stick with one model for this assignment? Since the difficulty of our project is not going to affect our grades anyways. What do you guys think?

Florencia's note: For this project, one kind of model is fine (you should tune the one though if there is a hyperparameter!). In Capstone, we will recommend you try a few - first focusing on the simplest, and then trying more complex ones to see if you can see improvements using your cross-validated metric. Remembering to not use the test set though until you choose one, final model.

zackt113 commented 2 years ago

I think SVC should be the best one, right? Or maybe Knn?

zackt113 commented 2 years ago

Just follow up, we have decided to consider our problem as a regression instead of classification in the lab. Because if we use both(classification and regression), it would be hard to compare which model is better due to the different scoring matrices. So, I used four regression models (Ridge, OneVsResrClassifer(LogisticRegression), SVC(kernel =linear), and Random Forest). So far, seems random forest is the best one. I will conduct hyperparameter optimization tomorrow. And I will push my codes tomorrow once I finalize everything.

Kingslin0810 commented 2 years ago

I believe if we treat it as a regression problem to predict the quality score, and then our model might not have a higher score at the end; unless we change our predictive question - instead of what we have now for predicting the exact quality score, but predict to be excellent (7-10) good (4-6) bad (0-3).

If we treat it as a regression problem for predicting the quality score, and I think Random Forest Regression will have a relatively better score. Potentially there will be an issue of over fitting. Then we might indicate for our next project is to refine our model in order to have a better test score. (So less overfitting issue)

I like the idea of having multiple models, then pick the best cross validation score model (e.g. Random Forest) for purpose of turning hyperparameter.

On Wed, Nov 24, 2021 at 12:27 AM zackt113 @.***> wrote:

Just follow up, we have decided to consider our problem as a regression instead of classification in the lab. Because if we use both(classification and regression), it would be hard to compare which model is better due to the different scoring matrices. So, I used four regression models (Ridge, OneVsResrClassifer(LogisticRegression), SVC(kernel =linear), and Random Forest). So far, seems random forest is the best one. I will conduct hyperparameter optimization tomorrow. And I will push my codes tomorrow once I finalize everything.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UBC-MDS/DSCI_522_Group19_Wine_Quality_Score_Predictor/issues/19#issuecomment-977642808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVEWPS3ZCP3EA7RBFGEG4QDUNSOVXANCNFSM5IU2BCJQ .

Kingslin0810 commented 2 years ago

I wonder if feature selection would improve our model score for Random Forest given ours is a regression prediction (targets could be any score between 0 and 10). https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e

zackt113 commented 2 years ago

Don't know if feature selection is going to help. I tuned the hyperparameter a bit but still, the overfitting issue is quite obvious. I will try to see today's lecture and choose to drop some features and try again tonight.

zackt113 commented 2 years ago

Hello, I pushed my works for model fitting. Please have a look and we can discuss further tomorrow!

zackt113 commented 2 years ago

Close issues