Open Sandy4321 opened 3 years ago
Most variables are correlated with each other and thus they are highly redundant, let's say if you have two variables that are highly correlated, keeping the only one will help in dimensionality reduction and it doesn't cause that much loss of information.
One Question may arise you, Which Variable to keep? Keep the one that has a higher correlation with the target variable.
I see but Collinear Features how you calculated collinearity for categorical values ?
Hi Sandy4321, I found a brilliant article that will help with your question : https://towardsdatascience.com/the-search-for-categorical-correlation-a1cf7f1888c9
can you clarify how you remove correlated features
as written in https://towardsdatascience.com/a-feature-selection-tool-for-machine-learning-in-python-b64dd23710f0
For each pair of correlated features, it identifies one of the features for removal (since we only need to remove one
so you just remove one from pair ?