feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.87k stars 310 forks source link

Provide optional "transform" function to run for each feature selection fit #776

Closed jolespin closed 1 month ago

jolespin commented 3 months ago

Is your feature request related to a problem? Please describe. Sometimes I use transformations that are dependent on the feature set. For example, one typical transformation is scaling by the total (e.g., x/x.sum()).

Describe the solution you'd like The original feature matrix is retained and upon each fit, the transformation is computed. Here's a wacky version just to show the concept:

def transform(X):
    return X/X.sum(axis=1).reshape(-1,1)
X = np.random.RandomState(0).randint(low=0, high=1000, size=(10,5))
y = np.random.RandomState(0).choice([0,1], size=10)
for i in range(1, X.shape[1]+1):
    X_query = X[:,:i]
    if X_query.shape[1] > 1:
        X_query = transform(X_query)
        # fit(X_query, y)

Describe alternatives you've considered I'm currently making a custom class and reimplementing the fit method to have this feature.

Additional context NA

solegalli commented 3 months ago

Hey @jolespin

Thanks for the engagement. I am not sure I understand what is being requested here. The functionality of the fit method for each feature selector is very different. We won't be able to make a quick fix to add a function that will work with all. We most likely need to modify each class individually.

But besides that, what exactly should this transform function be doing? and why at fit? it seems to me that is a bit specific to an individual problem. Could we not instead have a transformer that makes that transformation?.

jolespin commented 3 months ago

It's useful for when the transformation of feature is dependent on the entire feature set. If you transform the use original feature matrix, then remove a feature then the transformation will be off because that feature is missing. A good example would be if you were looking at proportions for your feature matrix.

Feel free to close if this is out of scope. Just an idea when I was testing out the package.

solegalli commented 2 months ago

I see. Thanks for the explanation. We keep it open for a while to see if there is interest from the community in this sort of transformation, and we can decide later.