Implement new transform_num() method: 'polynomial'

Description:

Method Functionality Idea:

The polynomial features transformation method generates polynomial features up to a specified degree for numerical variables. This technique captures non-linear relationships between variables and the target, enhancing model performance through feature engineering.

How it operates:

The method takes either a global degree (degree) or a variable-specific degree map (degree_map) and creates polynomial features for each numerical variable accordingly. For each variable, polynomial features up to the specified degree are generated and appended to the original DataFrame.

Usage:

To generate polynomial features for numerical variables:

# data to test with
data = {
    'Feature1': np.random.normal(0, 1, 100),  # Normally distributed data
    'Feature2': np.random.exponential(1, 100),  # Exponentially distributed data (positively skewed)
    'Feature3': np.random.randint(1, 100, 100)  # Uniformly distributed data between 1 and 100
}
df = pd.DataFrame(data)

# test polynomial with a degree_map
degree_map = {'Feature1': 2, 'Feature2': 3}
poly_transformed_df, poly_features = transform_num(df, ['Feature1', 'Feature2'], method='polynomial', degree_map=degree_map)

This method returns both the transformed DataFrame with original columns and polynomial features and a DataFrame containing only the polynomial features columns.

Notes:

Polynomial features can introduce higher dimensionality and may lead to overfitting, especially with high degrees.
It's essential to monitor model performance and consider regularization techniques when using polynomial features.

Implementation Summary:

The 'polynomial' method within transform_num() generates polynomial features up to a specified degree for numerical variables. This transformation helps capture non-linear relationships and enhances model performance through feature engineering.

Purpose:

The purpose of this method is to generate new polynomial features for specified numerical variables, which can enhance the predictive power of machine learning models.

Code Breakdown:

Method Header:

Purpose: To clearly indicate the start of the 'polynomial' method implementation and provide context.

if method.lower() == 'polynomial' and (degree is not None or degree_map is not None):
   print(f"< POLYNOMIAL FEATURES TRANSFORMATION >")
   print(f" This method generates polynomial features up to a specified degree for numerical variables.")
   print(f"  ✔ Captures non-linear relationships between variables and the target.")
   print(f"  ✔ Enhances model performance by adding complexity through feature engineering.")
   print(f"✎ Note: Specify the 'degree' for a global application or 'degree_map' for variable-specific degrees.\n")

Initialize DataFrame and Columns:
- Purpose: To prepare the necessary data structures for transformation.
```
transformed_df = df.copy()
poly_features = pd.DataFrame(index=df.index)
```

Define Function for Applying Degrees:

Purpose: To apply polynomial transformation based on specified degree.

def apply_degree(variable, d):
   for power in range(2, d + 1):  # Start from 2 as degree 1 is the original variable
       new_column_name = f"{variable}_degree_{power}"
       poly_features[new_column_name] = transformed_df[variable] ** power
       print(f"✔ Created polynomial feature '{new_column_name}' from variable '{variable}' to the power of {power}.\n")

Apply Transformation:

Purpose: To generate polynomial features using either global degree or degree from degree_map.

if degree_map:
   for variable, var_degree in degree_map.items():
       if variable in numerical_variables:
           apply_degree(variable, var_degree)
       else:
           print(f"⚠️ Variable '{variable}' specified in `degree_map` was not found in `numerical_variables` and has been skipped.\n")
else:
   for variable in numerical_variables:
       apply_degree(variable, degree)

Output Results:

Purpose: To display the results and perform a sanity check.

transformed_df = pd.concat([transformed_df, poly_features], axis=1)

print(f"✔ New transformed dataframe with polynomial features:\n{transformed_df.head()}\n")
print(f"✔ Dataframe with only the polynomial features:\n{poly_features.head()}\n")
print("☻ HOW TO: Apply this transformation using `transformed_df, poly_features = transform_num(your_df, your_numerical_variables, method='polynomial', degree=3)` or by specifying a `degree_map`.\n")

# Sanity check
print("< SANITY CHECK >")
print(f"  ➡ Shape of original dataframe: {df.shape}")
print(f"  ➡ Shape of transformed dataframe: {transformed_df.shape}\n")
print("* After applying polynomial features, evaluate the model's performance and watch out for overfitting, especially when using high degrees.\n")

return transformed_df, poly_features

See the Full Function:

You can refer to the complete implementation of the transform_num() function, including the 'polynomial' method, on GitHub: transform_num().

ETA444 / datasafari