Closed ETA444 closed 6 months ago
Implementation Summary:
The 'polynomial'
method within transform_num()
generates polynomial features up to a specified degree for numerical variables. This transformation helps capture non-linear relationships and enhances model performance through feature engineering.
Purpose:
The purpose of this method is to generate new polynomial features for specified numerical variables, which can enhance the predictive power of machine learning models.
Code Breakdown:
Method Header:
'polynomial'
method implementation and provide context.if method.lower() == 'polynomial' and (degree is not None or degree_map is not None):
print(f"< POLYNOMIAL FEATURES TRANSFORMATION >")
print(f" This method generates polynomial features up to a specified degree for numerical variables.")
print(f" ✔ Captures non-linear relationships between variables and the target.")
print(f" ✔ Enhances model performance by adding complexity through feature engineering.")
print(f"✎ Note: Specify the 'degree' for a global application or 'degree_map' for variable-specific degrees.\n")
Initialize DataFrame and Columns:
transformed_df = df.copy()
poly_features = pd.DataFrame(index=df.index)
Define Function for Applying Degrees:
def apply_degree(variable, d):
for power in range(2, d + 1): # Start from 2 as degree 1 is the original variable
new_column_name = f"{variable}_degree_{power}"
poly_features[new_column_name] = transformed_df[variable] ** power
print(f"✔ Created polynomial feature '{new_column_name}' from variable '{variable}' to the power of {power}.\n")
Apply Transformation:
if degree_map:
for variable, var_degree in degree_map.items():
if variable in numerical_variables:
apply_degree(variable, var_degree)
else:
print(f"⚠️ Variable '{variable}' specified in `degree_map` was not found in `numerical_variables` and has been skipped.\n")
else:
for variable in numerical_variables:
apply_degree(variable, degree)
Output Results:
transformed_df = pd.concat([transformed_df, poly_features], axis=1)
print(f"✔ New transformed dataframe with polynomial features:\n{transformed_df.head()}\n")
print(f"✔ Dataframe with only the polynomial features:\n{poly_features.head()}\n")
print("☻ HOW TO: Apply this transformation using `transformed_df, poly_features = transform_num(your_df, your_numerical_variables, method='polynomial', degree=3)` or by specifying a `degree_map`.\n")
# Sanity check
print("< SANITY CHECK >")
print(f" ➡ Shape of original dataframe: {df.shape}")
print(f" ➡ Shape of transformed dataframe: {transformed_df.shape}\n")
print("* After applying polynomial features, evaluate the model's performance and watch out for overfitting, especially when using high degrees.\n")
return transformed_df, poly_features
See the Full Function:
You can refer to the complete implementation of the transform_num()
function, including the 'polynomial'
method, on GitHub: transform_num().
Description:
Method Functionality Idea:
The
polynomial
features transformation method generates polynomial features up to a specified degree for numerical variables. This technique captures non-linear relationships between variables and the target, enhancing model performance through feature engineering.How it operates:
The method takes either a global degree (
degree
) or a variable-specific degree map (degree_map
) and creates polynomial features for each numerical variable accordingly. For each variable, polynomial features up to the specified degree are generated and appended to the original DataFrame.Usage:
To generate polynomial features for numerical variables:
This method returns both the transformed DataFrame with original columns and polynomial features and a DataFrame containing only the polynomial features columns.
Notes: