Group project completed as part of UBC Master of Data Science Program. The project involved the creation and analysis of a machine learning model which predicts the quality rating a wine will receive from a critic based on a variety of physiochemical factors.
Hello there, great job on your project! Here is my feedback:
Documentation: The project is very well documented. I can easily find every file in its corresponding folder.
Code: The code in every script is very intuitive and well documented with comments. However, there is some comments in the eda_figures.py to generate "second figure". You can delete those if they are not useful.
Analysis and reasoning:
According to what we learn in DSCI 571 and 573, we should split the original dataset in to training and test data before we conducting EDA, and we only do EDA on the training set in order to avoid breaking the golden rule.
I think it would be nice to talk more about why tuning the specific hyperparameters in your hyperparameter tuning part. What's your thoughts before the randomize search and is the tuning results appropriate other than overfitting or underfitting? Maybe adding a little bit more explanation would be a good improvement.
Communication: for your final report:
The first sentence of "Analysis" section is "A classification model was built with python scripts using the sk-learn RandomForestRegressor algorithm... " In my opinion, it should be "A regression model" other than a classification model.
In my point of view, the size of first two figure is a bit small, as they contain useful information in their axis and the text layer of correlation plot.
The third figure can be improved by increasing the axis title size or decreasing the axis label size.
Suggestions: all my suggestions are included in the points of feedback above. As you talked about adding feature selection in the future to improve the model, a friendly advice for that is explaining more about your thoughts during the process, since the feature selection is a tricky part, and not all the audience can agree on your choices.
Overall, excellent project! I've learned a lot from it. Thank you for your work and good luck on the future projects!
Hello there, great job on your project! Here is my feedback:
Documentation: The project is very well documented. I can easily find every file in its corresponding folder.
Code: The code in every script is very intuitive and well documented with comments. However, there is some comments in the
eda_figures.py
to generate "second figure". You can delete those if they are not useful.Analysis and reasoning:
Communication: for your final report:
RandomForestRegressor
algorithm... " In my opinion, it should be "A regression model" other than a classification model.Suggestions: all my suggestions are included in the points of feedback above. As you talked about adding feature selection in the future to improve the model, a friendly advice for that is explaining more about your thoughts during the process, since the feature selection is a tricky part, and not all the audience can agree on your choices.
Overall, excellent project! I've learned a lot from it. Thank you for your work and good luck on the future projects!