My name is Nicholas Wu and I have been assigned for your group's peer review. I think your analysis is well performed and your final report on the analysis is well written with the limited time provided. There are some room for improvement that I think could help your audience to fully grasp your result. Please see below for some suggestions:
Documentation and coding: Both are well documented and easy to read. One suggestion I have is that for your release 0.1.0, maybe it is better to have a separate folder for your EDA so it is easier for others to locate.
Analysis and reasoning: Your analysis is well done. I can see that you have found two important features (credit limit and education level) for your model so maybe it would have been better to utilize the finding in building your model. For example, did the logistic regression return a higher coefficient for either credit limit or education level? A model may not perform well but we can still learn something from your model. It seems that your data is from 2005 so it may not be relevant to apply your model to the data we have now. Therefore, I think by adding a coefficient table in your results can help you to build a stronger conclusion. The coefficient table can also be a useful reference to think of some other features that could contribute to the accuracy of your model.
I also find that since your randomforest classifier is clearly overfitting, maybe it is better to leave it out of hyperparameter tuning. Would you agree?
Communication: There are some room for improvement such as grammar and spelling in your README and final report. In addition, as TA have mentioned in other issues, your plots seem to be missing some important titles, such as in your correlation heat map between education level and default payments. I would paste your reports in Word to check for any grammar mistakes.
ps - "Table 2. F1 score with optimized hyperpamaters for each model" have repeated twice in your final report.
Hello Team 12:
My name is Nicholas Wu and I have been assigned for your group's peer review. I think your analysis is well performed and your final report on the analysis is well written with the limited time provided. There are some room for improvement that I think could help your audience to fully grasp your result. Please see below for some suggestions:
Documentation and coding: Both are well documented and easy to read. One suggestion I have is that for your release 0.1.0, maybe it is better to have a separate folder for your EDA so it is easier for others to locate.
Analysis and reasoning: Your analysis is well done. I can see that you have found two important features (credit limit and education level) for your model so maybe it would have been better to utilize the finding in building your model. For example, did the logistic regression return a higher coefficient for either credit limit or education level? A model may not perform well but we can still learn something from your model. It seems that your data is from 2005 so it may not be relevant to apply your model to the data we have now. Therefore, I think by adding a coefficient table in your results can help you to build a stronger conclusion. The coefficient table can also be a useful reference to think of some other features that could contribute to the accuracy of your model.
I also find that since your randomforest classifier is clearly overfitting, maybe it is better to leave it out of hyperparameter tuning. Would you agree?
Communication: There are some room for improvement such as grammar and spelling in your README and final report. In addition, as TA have mentioned in other issues, your plots seem to be missing some important titles, such as in your correlation heat map between education level and default payments. I would paste your reports in Word to check for any grammar mistakes.
ps - "Table 2. F1 score with optimized hyperpamaters for each model" have repeated twice in your final report.
Nicholas Wu Your Classmate