Closed gverbock closed 1 year ago
I see this behaviour on a dataset used for work so I cannot put the full example here. I will try to reproduce it on synthetic data to better understand what is exactly going on here.
Hey @gverbock
I think you already describe what's going on here. f_6 is grouped with f_2, then f_2 is removed from the data, and f_6 remains in the data. But f_6 is also correlated with f_5. But f_5 was not correlated with f_2 above the threshold.
We've been discussing this for a while. I link #327
I am not sure how to resolve this problem to be honest. We've got a PR #633 that allows us to order the features. This will ensure reproducibility. But it does not address this particular issue.
That's correct. I'll close this issue then and will think about it further.
When running some project with SmartCorrelatedSelection I found an unexpected behavior. I have the following correlation matrix
If I am using
Then the result is a matrix with correlations higher than the threshold.
f_5 and f_6 are still correlated above the 0.8 threshold. I had a quick look at the issue and I believe it is related to the definition of the _examined_features. If a feature is selected and all others are already in the _examined_features, it will be considered as a non-correlated feature but that is not by definition the case.