UBC-MDS / Credit-Card-Default-Prediction

Credit Card Default Prediction
https://github.com/UBC-MDS/Credit-Card-Default-Prediction/blob/main/reports/_build/pdf/book.pdf
MIT License
1 stars 6 forks source link

Milestone 1 feedback #36

Open andytai7 opened 2 years ago

andytai7 commented 2 years ago

3. Project proposal: reasoning Comments "### Given characteristics (gender, education, age, marriage) and payment history of a customer, are they likely to default on the credit card payment next month? " This should be labelled as the aim.

Why are you doing features, searching for missing values, scaling some parts, and encoding categorical variables into useable parts? What packages are you going to use? What types of plots? Why? Any hypothesizes?

What about class balance? There could be an imbalance in the classes, in which you would have to under-sample or oversample. Which one will you utilize?

What about missing data? How will you handle the missing data?

Why only use linear regression? Have you thought of using wrapper algorithms (boruta algorithm) for feature selection?

Will you do cross-validation?

What about metrics? That was not touched upon in the project proposal. A suggestion for metrics to determine the performance of your models is Area Under Curve (AUC). The Area Under the Curve (AUC) measures the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. Also, look into and SHAP (Shapley Additive exPlanations), which explains the direction of each variable compared to the outcome variable.

Spelling errors

5. Exploratory data analysis in a literate code document: VIZ Comments Why are you using certain plot techniques?

5. Exploratory data analysis in a literate code document: REASONING Comments "We notice a minor class imbalance, but we will continue experimenting with model training and evaluate if we value a certain metric (i.e. recall) more and observe whether our models are performing poorly on the minority class prediction. We will then consider employing techniques to deal with class imbalance." This needs to be addressed before going further! A 1-4 ratio is not minor.

There needs to be more rationale and conclusion in terms of why you are doing certain EDA techniques which is missing here.