feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.89k stars 310 forks source link

Transformations by group #460

Open ftvalentini opened 2 years ago

ftvalentini commented 2 years ago

Is there support for creating new variables that are summaries of existing variables by group? I can't seem to find it.

For example:

import pandas as pd

df = pd.DataFrame({
    'group': ['A', 'A', 'B', 'B', 'B', 'A'],
    'x1': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
    'x2': [0.7, 0.8, 0.9, 0.1, 0.2, 0.3]
})

# new variables: mean of x1 and x2 by group
df[["x1_mean", "x2_mean"]] = df.groupby('group')[['x1', 'x2']].transform("mean")
print(df)
  group   x1   x2  x1_mean  x2_mean
0     A  0.1  0.7      0.3      0.6
1     A  0.2  0.8      0.3      0.6
2     B  0.3  0.9      0.4      0.4
3     B  0.4  0.1      0.4      0.4
4     B  0.5  0.2      0.4      0.4
5     A  0.6  0.3      0.3      0.6

This type of feature engineering is usually helpful.

solegalli commented 2 years ago

Hi @ftvalentini

We already got an issue that seems capture your proposal here #290

My suggestion would be to place the transformer in the "creation" folder and call the transformer "AggregateFeatures".

The transformer would have the parameters:

The skeleton for the transformer can take inspiration from the class MathFeatures

Additional considerations in the original issue #290

Would you like to give it a go at creating this transformer?