feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.8k stars 303 forks source link

Extend PSI feature selection to categorical variables #657

Closed dlaprins closed 1 year ago

dlaprins commented 1 year ago

closes #655 closes #658

ClaudioSalvatoreArcidiacono commented 1 year ago

Hey @dlaprins ping me once this PR is ready for review!

dlaprins commented 1 year ago

Hey @ClaudioSalvatoreArcidiacono , ready for review. Thanks for the heads-up on category-type variables: the min_pct_empty_bins wasn't handled correctly. Fixed it.

I included all your tests, just changed the ordering of the variables (self.variables_ takes all numericals first, then all categoricals).

dlaprins commented 1 year ago

Thank you both @ClaudioSalvatoreArcidiacono @solegalli for your helpful suggestions and directions. I have tried to fix all the shortcomings you pointed out in the comments in my commit. Please let me know which issues remain unresolved and I'll solve those as well.

solegalli commented 1 year ago

FYI @dlaprins

Please rebase main, we just merged #660 :)

dlaprins commented 1 year ago

FYI @dlaprins

Please rebase main, we just merged #660 :)

Fixed, as well as split up the loop over variables to 2 loops over cat variables and num variables separately as requested.