On the whole, the project looks cohesive and professional. Nicely done! I did not run into any technical issues with your project using the a minimal environment with the listed dependencies.
Here are some thoughts:
1. Reproducibility
Your Make code runs on my computer from start to finish (hooray!), but your results are not 100% reproducible because you do not use random_state = 123 or np.random.seed(123). I noticed this impacts your accuracy scores for your RandomForest() and DummyClassifier(). I would parametrize the docopt script to take a default seed that is then set before the cross-validation begins. I would also suggest removing the score time and fit time from your report, because I don’t think that it is that relevant, and they will always be different from one run to another (and therefore, will register as a git diff). If you do keep them in your report, maybe I would suggest reporting the fit/score times in a separate table, so it is easier to compare (with git) whether you get the same CV scores from one call of make all to another.
2. Dry-out your Makefile
Instead of writing out almost the same code and dependencies for each report (md/HTML), you could merge them into one and use the argument output_format = 'all' in `render().
In your report you say "From our cross validation results, we can see that we are overfitting with Random_Forest". But your test-set score is in line with your CV score. I would not say that this is overfitting. Of course, your train score is memorizing the data, but this seems trivial since no one in their right mind would use training performance as an estimate of generalization error for complex models.
This is also mentioned in your discussion section.
4. Minor issues and suggestions
I would rename “test_score” to “cross_validation_score” in your results table, because people who are not familiar with the sklearn behaviour may not know that this is not your test-set accuracy.
You report a best accuracy of 0.85. It would be helpful to explicitly mention the no-information rate (from DummyClassifier) when you say that that 0.85 is a good score. E.g. "Our best model achieved an accuracy of 0.85, which is a noticeable improvement from the no-information rate of 0.65.”
I love your EDA plot that has small multiples for densities! It would help me even more if each of the three quality_levels was normalized so we can more easily compare the classes that are less frequent (the green densities are very small!).
I suggest making “Bestmodel.csv” lowercase
I would lint your Python code with flake8 or black
What about sorting your results table in order of best to worse performance? And maybe plotting the results?
You've binned your wine quality into a three-class ordinal response. What informed the choices of binning (why three levels and not two? Why these particular cutoffs?)? Maybe there are other metrics or loss functions your could explore? For example, mis-classifying a "Bad" wine as "Excellent" is worse than misclassifying it as "Good". This is a more a point for contemplation than implementation, since this may not be an easy issue to address and you are far along in your analysis already.
Hello Pan, Chun, and Sakshi!
Here is some peer feedback on 3f0c4a9
On the whole, the project looks cohesive and professional. Nicely done! I did not run into any technical issues with your project using the a minimal environment with the listed dependencies.
Here are some thoughts:
1. Reproducibility
Your
Make
code runs on my computer from start to finish (hooray!), but your results are not 100% reproducible because you do not userandom_state = 123
ornp.random.seed(123)
. I noticed this impacts your accuracy scores for yourRandomForest()
andDummyClassifier()
. I would parametrize the docopt script to take a default seed that is then set before the cross-validation begins. I would also suggest removing the score time and fit time from your report, because I don’t think that it is that relevant, and they will always be different from one run to another (and therefore, will register as a git diff). If you do keep them in your report, maybe I would suggest reporting the fit/score times in a separate table, so it is easier to compare (with git) whether you get the same CV scores from one call ofmake all
to another.2. Dry-out your Makefile
Instead of writing out almost the same code and dependencies for each report (md/HTML), you could merge them into one and use the argument
output_format = 'all'
in `render().This could be collapsed down to:
This would require you to also add
github_document
to your report YAML:3. Overfitting or not overfitting?
In your report you say "From our cross validation results, we can see that we are overfitting with Random_Forest". But your test-set score is in line with your CV score. I would not say that this is overfitting. Of course, your train score is memorizing the data, but this seems trivial since no one in their right mind would use training performance as an estimate of generalization error for complex models.
This is also mentioned in your discussion section.
4. Minor issues and suggestions
DummyClassifier
) when you say that that 0.85 is a good score. E.g. "Our best model achieved an accuracy of 0.85, which is a noticeable improvement from the no-information rate of 0.65.”quality_level
s was normalized so we can more easily compare the classes that are less frequent (the green densities are very small!).Great work! Keep it up!