Group project completed as part of UBC Master of Data Science Program. The project involved the creation and analysis of a machine learning model which predicts the quality rating a wine will receive from a critic based on a variety of physiochemical factors.
This is a really great idea for a project and excellent implementation. I have some notes while reviewing your project, I am hoping it can be helpful.
Good documentation in general. Noticed there is this line in eda_figures.py : # Save figures in html (png was giving me errors for saving?) But it seems you're saving it as png so maybe the comment needs to be updated?
I tried running the makefile and I got this error:
python src/download_data.py data raw_data.csv python src/preprocess.py data raw_data.csv Traceback (most recent call last): File "src/preprocess.py", line 88, in <module> train_df.to_feather( File "/opt/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 214, in wrapper return func(*args, **kwargs) TypeError: to_feather() got an unexpected keyword argument 'compression' make: *** [data/test_df.feather] Error 1
Also maybe you should consider having a yaml file with your environment to make it easier to get the dependencies.
The EDA notebook looks good and easy to follow. I would recommend generating an HTML file from the notebook to exclude code chunks which you can automate using the following command from the terminal:
jupyter nbconvert eda/wine_quality_eda.ipynb --no-input --to html --TemplateExporter.exclude_input=True --no-prompt
Also, I am noticing that you perform some of the eda on the whole data not just the training part which could be a concern for the golden rule. You also split your data in the EDA and then again in the preprocess.py script so these two would result in different splits, it might be a good idea to do it just once and save it and read in any other place you need it.
Best of luck on your project and the rest of the block!
This is a really great idea for a project and excellent implementation. I have some notes while reviewing your project, I am hoping it can be helpful.
python src/download_data.py data raw_data.csv python src/preprocess.py data raw_data.csv Traceback (most recent call last): File "src/preprocess.py", line 88, in <module> train_df.to_feather( File "/opt/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 214, in wrapper return func(*args, **kwargs) TypeError: to_feather() got an unexpected keyword argument 'compression' make: *** [data/test_df.feather] Error 1
Also maybe you should consider having a yaml file with your environment to make it easier to get the dependencies.Best of luck on your project and the rest of the block!