UBC-MDS / DSCI_522_group09_Wine_Quality_Predictor

A machine learning pipeline for classification and prediction of wine quality based on relevant features
https://ubc-mds.github.io/DSCI_522_group09_Wine_Quality_Predictor/index.html
MIT License
4 stars 6 forks source link

Milestone 2 Review #148

Closed mohamad-amin closed 2 years ago

mohamad-amin commented 2 years ago

Good job! I have a few concerns:

You're using try, except everywhere instead of checking if a file exists when outputting (and maybe reading) the files. You could instead check if the file exists and create it if it doesn't. Using try except in production except for when dealing with 3rd party stuff is not a good practice.

Also, you could use some for loops for your eda.py.

In your preprocessing, are you sure that no values need normalization, outlier removal or similar tricks? I think you are doing some preprocessing in the machine learning file, which is not a good practice.

For your report, I'm getting a 404 error. Can you please fix it and let me know to fix the grade accordingly? Moreover, for the different parts of reports, I'm seeing an unrendered "```{figure} ../../results/figure_1_class_imbalance.png". That could be fixed too.

Thanks.

gfairbro commented 2 years ago

Hi @mohamad-amin , here is a link to that milestone's release. There is no way to edit the readme of the release that i can see. It appears that once the release was created, it broke the link (due to URL issues perhaps?). I am not sure what else we can do to remedy that: https://htmlpreview.github.io/?https://raw.githubusercontent.com/UBC-MDS/DSCI_522_group09_Wine_Quality_Predictor/0.1.0/reports/wine_quality_predictor_report/_build/html/report_summary.html

We will add comments about your other suggestions soon.

GloriaWYY commented 2 years ago

Thank you for your suggestions. We decided to use if statement to check if a file exists before exporting it, and we adopts for loops to make our codes DRY. Changes can be seen in 2dee29a5c4941d463f0664748b65ac28a579ea51

Luming-ubc commented 2 years ago

Hi @mohamad-amin, thanks for the feedback. Regarding data preprocessing, we agree that it would be better if we could separate those preprocessing steps (standard scaler, OHE) from machine learning script. However, in our case, the dataset was clean, we do not have missing values or obvious outliers. The transformations were relatively simple. Due to the simplicity, we incorporated the standard scaler (normalization) into our preprocessing steps in the machine learning script. Also, our preprocessing script generates outputs for both EDA and ML script. We do not want to apply normalization before EDA steps. So, we decided the preprocessing script is merely for separating to feed both EDA script and machine learning script.

Luming-ubc commented 2 years ago

Regarding the 404 error, at the time of our Milestone2 release, the link was valid. The link was a relative path to the html file in our repo. However, we decided to remove it and changed it to the "githubpages" link later. I guess right now we could not make changes on the release. But here is the latest link for our final report. We fixed the rendering issue in the final report. The link is also valid in the current Readme.md file. Sorry for the confusion, hope it helps for your grading.

paradise1260 commented 2 years ago

Hi @mohamad-amin, thank you for your feedback. Regarding the preprocessing, in addition to what Luming said, I should say that we wanted to carry out cross validation in ML, and without pipelines we would violate the golden rule. Therefore, we had to have the pipelines of preprocessing and models in the same script, and we put all of them in ML script. I should mention that we confirmed it with one of the TAs before finalizing the ML script. Please let us know if you have any other concerns. Thanks,

gfairbro commented 2 years ago

Thanks for the feedback @mohamad-amin, we have addressed them where applicable in the repo.