Closed ETA444 closed 6 months ago
Here’s how you can implement the 'quantile'
method within the transform_num()
function:
Implementation Summary:
The 'quantile'
method applies a quantile transformation to numerical variables, mapping the data to a specified distribution.
Code Breakdown:
Method Header:
'quantile'
method implementation and provide context.if method.lower() == 'quantile':
print(f"< QUANTILE TRANSFORMATION >")
print(f" This method maps the data to a '{output_distribution}' distribution and n_quantiles = {n_quantiles}. Random state set to {random_state}")
print(f" ✔ Transforms skewed or outlier-affected data to follow a standard {'normal' if output_distribution == 'normal' else 'uniform'} distribution, improving statistical analysis and ML model accuracy.")
print(f" ✔ Utilizes {n_quantiles} quantiles to finely approximate the empirical distribution, capturing the detailed data structure while balancing computational efficiency.\n")
print(f"☻ Tip: The choice of 1000 quantiles as a default provides a good compromise between detailed distribution mapping and practical computational demands. Adjust as needed based on dataset size and specificity.\n")
Initialize Quantile Transformer:
# initialize the DataFrame to work with
transformed_df = df.copy()
# define and apply Quantile Transformer
quantile_transformer = QuantileTransformer(output_distribution=output_distribution, n_quantiles=n_quantiles, random_state=random_state)
transformed_df[numerical_variables] = quantile_transformer.fit_transform(df[numerical_variables])
Output Results:
# isolate transformed columns to give as part of output
quantile_transformed_columns = transformed_df[numerical_variables]
print(f"✔ New transformed dataframe:\n{transformed_df.head()}\n")
print(f"✔ Dataframe with only the transformed columns:\n{quantile_transformed_columns.head()}\n")
print("☻ HOW TO: Apply this transformation using `transformed_df, quantile_transformed_columns = transform_num(your_df, your_numerical_variables, method='quantile', output_distribution='normal', n_quantiles=1000, random_state=444)`.\n")
# sanity check
print("< SANITY CHECK >")
print(f" ➡ Shape of original dataframe: {df.shape}")
print(f" ➡ Shape of transformed dataframe: {transformed_df.shape}\n")
print("* After transformation, evaluate your data's distribution and consider its impact on your analysis or modeling approach.\n")
return transformed_df, quantile_transformed_columns
See the Full Function:
You can refer to the complete implementation of the transform_num()
function, including the 'robust'
method, on GitHub: transform_num().
Description:
Method Functionality Idea:
The
quantile
transformation method maps the data to a specified distribution using quantiles. It transforms skewed or outlier-affected data to follow either a standard normal or uniform distribution, enhancing statistical analysis and ML model accuracy.How it operates:
The method utilizes Quantile Transformation to map the data to the specified output distribution using a defined number of quantiles (
n_quantiles
). By default, it uses 1000 quantiles to finely approximate the empirical distribution, capturing detailed data structures while maintaining computational efficiency.Usage:
To perform quantile transformation on numerical variables:
This method returns both the DataFrame with transformed numerical variables and a DataFrame containing only the transformed columns. Additionally, you can customize:
output_distribution
,n_quantiles
andrandom_state
! (they have default values set reasonably, so users who don't need this level of customization can just use the method out of the box)Notes: