Hyster329 / Bank

0 stars 1 forks source link

Data cleaning by Hue Yu #1

Closed Mark-Kitur closed 2 months ago

Mark-Kitur commented 2 months ago

Methods used

Removal of duplicated data: This really helped in accurate feature representation as it reduced skewed distribution. The model performance would also improve since overfitting would be prevented. Deletion of "unknown": NA were labeled as unknowns. Since, unknown values can distort the statistical properties, it is always advised to delete or impute them with median.

Mark-Kitur commented 2 months ago

Thank you @Hyster329 I have learnt how import removal of duplicated data.