feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.88k stars 310 forks source link

speed-up correlation selection transformers, make them deterministic #721

Closed solegalli closed 7 months ago

solegalli commented 8 months ago

closes #619 closes #648 closes #633 closes #612 closes #446 closes #684 closes #570 closes #703

FYI @dlaprins @glevv

Sorts variables alphabetically to avoid different results. Adds adaptation of @dlaprins numpy implementation to speed search.

I'd like to keep the brute force correlation plain simple. And then we add the ordering per cardinality, etc to the smart correlation, which sort of has it already.

The unrelated error is in the recursive feature selectors. Something change in the estimators, or random state or something that now the values are different. Need to make that deterministic somehow.

solegalli commented 7 months ago

Observations:

I could take the opportunity to speed up the selection tests for the recursive feature elimination, those are the ones failing.

The documentation for the smart correlation selector, could also be improved by showcasing the other methodologies.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (3571426) 98.16% compared to head (f6163cf) 98.31%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #721 +/- ## ========================================== + Coverage 98.16% 98.31% +0.15% ========================================== Files 103 103 Lines 3930 3928 -2 Branches 771 764 -7 ========================================== + Hits 3858 3862 +4 + Misses 26 23 -3 + Partials 46 43 -3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.