Hello, great effort on the project and nice dataset selection ;)
Please see my feedback below after reviewing your project:
Documentation: Functions and code are well-documented!
Code: Readable and reproducible. I would recommend some unit testing where appropriate for some of the functions under /src
Analysis and reasoning: Overall clear analysis and flow of reasoning. I was researching information on this datasets since it applies to 573 lab4, and I found that-2 in pay_x column refers to what is known as revolving credit not sure what that implicates for the data but perhaps it is worth it to re-run the analysis without replacing -2 with 0 as you mentioned in your EDA report.
Communication: A small typo but the EDA link in the readme links to an individual team member repo outside of your main repo. Perhaps it is worthwhile to include the EDA file in the main repo as well.
Suggestions:
As you already mentioned in the report, I think it is important that you include a confusion matrix plot/table in the final report. For such a prediction problem (default vs no default), false negatives are more detrimental compared to false positives, so while an f1 score is better than accuracy it is still important to see what category the incorrect classification fall into (false negative vs false positive). I think in your model, you would rather aim for higher recall as opposed to higher precision.
Hello, great effort on the project and nice dataset selection ;)
Please see my feedback below after reviewing your project:
Documentation: Functions and code are well-documented!
Code: Readable and reproducible. I would recommend some unit testing where appropriate for some of the functions under
/src
Analysis and reasoning: Overall clear analysis and flow of reasoning. I was researching information on this datasets since it applies to 573 lab4, and I found that
-2
inpay_x
column refers to what is known asrevolving credit
not sure what that implicates for the data but perhaps it is worth it to re-run the analysis without replacing-2
with0
as you mentioned in your EDA report.Communication: A small typo but the EDA link in the
readme
links to an individual team member repo outside of your main repo. Perhaps it is worthwhile to include the EDA file in the main repo as well.Suggestions: As you already mentioned in the report, I think it is important that you include a confusion matrix plot/table in the final report. For such a prediction problem (default vs no default), false negatives are more detrimental compared to false positives, so while an
f1
score is better thanaccuracy
it is still important to see what category the incorrect classification fall into (false negative vs false positive). I think in your model, you would rather aim for higher recall as opposed to higher precision.