Closed ETA444 closed 6 months ago
Implementation Summary:
The 'standardize'
method centers the data around mean 0 with a standard deviation of 1, enhancing model performance and stability. It's an essential preprocessing step for many machine learning algorithms.
Code Breakdown:
Method Header:
'standardize'
method implementation and provide context.if method.lower() == 'standardize':
print(f"< STANDARDIZING DATA >")
print(f" This method centers the data around mean 0 with a standard deviation of 1, enhancing model performance and stability.")
print(f" ✔ Standardizes each numerical variable to have mean=0 and variance=1.")
print(f" ✔ Essential preprocessing step for many machine learning algorithms.\n")
print(f"✎ Note: Standardization is applied only to the specified numerical variables.\n")
Initialize and Apply Standardization:
StandardScaler
.# initialize essential objects
transformed_df = df.copy()
scaler = StandardScaler()
# scale the data
transformed_df[numerical_variables] = scaler.fit_transform(df[numerical_variables])
# isolate transformed columns to give as part of output
standardized_columns = transformed_df[numerical_variables]
Output Results:
print(f"✔ New transformed dataframe:\n{transformed_df.head()}\n")
print(f"✔ Dataframe with only the transformed columns:\n{standardized_columns.head()}\n")
print("☻ HOW TO - Apply this transformation using `transformed_df, standardized_columns = transform_num(your_df, your_numerical_variables, method='standardize')`.\n")
# sanity check
print("< SANITY CHECK >")
print(f" ➡ Shape of original dataframe: {df.shape}")
print(f" ➡ Shape of transformed dataframe: {transformed_df.shape}\n")
return transformed_df, standardized_columns
Link to Full Code: transform_num.py
Description:
Method Functionality Idea:
The
standardize
method intransform_num
standardizes the numerical variables by centering them around mean 0 and scaling them to have a standard deviation of 1. This transformation enhances model performance and stability, particularly for machine learning algorithms.How it operates:
The method first creates a copy of the DataFrame to preserve the original data. It then initializes a StandardScaler object to perform the standardization. The method scales each specified numerical variable and returns both the transformed DataFrame and a DataFrame containing only the scaled columns.
Usage:
To standardize numerical variables in a DataFrame:
Notes: