Week 3 : Credit Risk Modelling

Ds2023 commented 1 month ago

Data Cleaning and Exploration:

Dropping Rows (23, 24, 26, 27): It would be helpful to understand the rationale behind dropping these specific rows before Exploratory Data Analysis (EDA). Were there missing values, or did they contain outliers? Mentioning the reason in your report would improve transparency. Imputation Techniques: Consider conducting more extensive EDA to identify the most appropriate imputation strategy for each categorical feature. Imputing all missing values with "None" might not be optimal for every column. Explore alternatives like mode imputation or category encoding based on your findings. Inconsistent Imputation: There seems to be a discrepancy in how missing values are imputed. Ensure you consistently use either "None" or another chosen method across all categorical columns. Clean up your notebook to reflect the final approach. Correlation Matrix: While a correlation matrix before imputation can be informative, it might be even more valuable to create one after imputation to assess relationships between features.

Feature Engineering:

Numerical Imputation: Consider analyzing the distribution of numerical features (e.g., histograms) before imputing missing values. Techniques like median imputation might not be suitable for skewed distributions. Explore alternatives like mean/median imputation based on the distribution or using a K-Nearest Neighbors (KNN) imputer for more sophisticated handling.

Exploratory Data Analysis (EDA):

Class Distribution Visualization: The current visualization for class distribution might not be the most effective. Consider using bar charts, histograms, or pie charts to clearly show the imbalance between classes. Findings from EDA: Include key insights from your EDA in your report. This helps demonstrate your understanding of the data and justify your modeling choices. Bivariate and Multivariate Analysis: Explore relationships between features using techniques like scatter plots. Consider coloring the data points by class to reveal potential relationships between features and class labels.

Modeling:

PCA Transformation: While you applied Principal Component Analysis (PCA), it seems the model wasn't trained on the transformed data. Ensure you fit your model on the transformed features after applying PCA. Class Imbalance: The data might exhibit class imbalance. Explore techniques like oversampling, undersampling, or using models specifically designed for imbalanced datasets (e.g., SMOTE, Random Forest).

Code and Output:

Fitting the Model: It seems the model fitting process encountered an error. Address the error and ensure the model is trained successfully. Model Clarity: Consider fitting a single model for all cells instead of multiple models per cell. This can improve code readability and maintain a clear connection between code and output.

Overall:

This is a good initial attempt at tackling a Machine Learning project. By addressing the points mentioned above, you can solidify your understanding of these concepts and create even more robust models in the future.

OJO44 commented 1 month ago

Thank you , your feedback is very invaluable to me and i want to start applying them even in my normal daily task.

OJO44 commented 1 month ago

...in my daily machine learning practices.

OJO44 / 24bMachineLearning1