Great work on this project - I like the dataset you chose and your discussion on the dataset. I'm impressed you have preliminary modelling in addition to the final modelling. You're going to have a leg up for this week's 573 lab!
From reading on the dataset for lab this week, the PAY_ columns follow the same naming convention as the other columns I think. I.e PAY_0= Status of payment in September 2005 and PAY_6= Status of payment in April 2005
In your EDA document use knitr::kable to summarize the tabular results maybe instead of the raw output?
I couldn't find a link from your README to your final report document
The link in your README to your EDA document is pointing to HazelJJJJ's fork version of the document. Not sure if this was purposeful
I don't see the Education feature in Figure 2. in your final report?
Your table captions are duplicated in your doc/report.md file
Nice work all around - let me know if you have any questions about something I've mentioned.
Hi Group 12!
Great work on this project - I like the dataset you chose and your discussion on the dataset. I'm impressed you have preliminary modelling in addition to the final modelling. You're going to have a leg up for this week's 573 lab!
Just a few comments:
Coding:
RandomForest
model into a Pandas DataFrame, not sure if this is was causing issues as you end up pulling theLogisticRegression
model as your best fit each time: https://github.com/UBC-MDS/DSCI522_group_12/blob/445e3e14f36875078ca4ef0426d3a19ebb40e8f5/src/fit_predict_default_model.py#L114ColumnTransformer
here to handle the PAY status columns as ordinal?clean_split_cred.R
file you replace values with < -1 in thePAY_
columns as erroneous. I found this discussion of what the values of -2,-1,0 mean and potentially you want to keep these different categories: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset/discussion/34608Formatting:
PAY_
columns follow the same naming convention as the other columns I think. I.ePAY_0
= Status of payment in September 2005 andPAY_6
= Status of payment in April 2005knitr::kable
to summarize the tabular results maybe instead of the raw output?doc/report.md
fileNice work all around - let me know if you have any questions about something I've mentioned.
Dustin