Removal of duplicated data: This really helped in accurate feature representation as it reduced skewed distribution. The model performance would also improve since overfitting would be prevented.
Deletion of "unknown": NA were labeled as unknowns. Since, unknown values can distort the statistical properties, it is always advised to delete or impute them with median.
Methods used
Removal of duplicated data: This really helped in accurate feature representation as it reduced skewed distribution. The model performance would also improve since overfitting would be prevented. Deletion of "unknown": NA were labeled as unknowns. Since, unknown values can distort the statistical properties, it is always advised to delete or impute them with median.