ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement new transform_cat() method: 'encode_ordinal' #8

Closed ETA444 closed 7 months ago

ETA444 commented 9 months ago

Description:


Method Functionality Idea:

The encode_ordinal method assigns an integer to each category value based on the provided ordinal order.

How it operates:

The method encodes each variable according to the provided ordinal map, where the order of categories for each variable is specified. It converts categorical variables to ordinal integers based on the specified order.

Usage:

To use the encode_ordinal method:

transformed_df, encoded_columns = transform_cat(your_df, your_columns, method='encode_ordinal', ordinal_map=your_ordinal_map)

Ensure that your_ordinal_map is a dictionary where keys are the variable names and values are lists specifying the order of categories.

Sanity Check:

A sanity check is performed to compare the shape of the original dataframe with the transformed dataframe.

Example Usage:

# encode_ordinal
ordinal_map = {
    'Category': ['student', 'high school', 'college', 'university']
}
ordinal_encoded_df, ordinal_encoded_cols = transform_cat(final_transformed_df, ['Category'], method='encode_ordinal', ordinal_map=ordinal_map)
ETA444 commented 7 months ago

Implementation Summary:

The 'encode_ordinal' method within transform_cat() maps categorical values to integers based on an order defined in ordinal_map. This method is suitable for ordinal data where the order of categories is meaningful.

Purpose:

The purpose of this method is to prepare categorical data for machine learning models by representing each category as an integer, which is important for ordinal data where the order of categories carries significance.

Code Breakdown:

  1. Method Header:

    • Purpose: To clearly indicate the start of the 'encode_ordinal' method implementation and provide context.
    if method.lower() == 'encode_ordinal' and ordinal_map:
       print(f"< ORDINAL ENCODING TRANSFORMATION >")
       print(f" This method assigns an integer to each category value based on the provided ordinal order.")
       print(f"✎ Note: Ensure the provided ordinal map correctly reflects the desired order of categories for each variable.")
       print("☻ Tip: An ordinal map dictionary looks like this: {'your_variable': ['level1', 'level2', 'level3'], ...}\n")
  2. Initialize DataFrame:

    • Purpose: To prepare the necessary data structures for transformation.
    transformed_df = df.copy()
    encoded_columns = pd.DataFrame()
  3. Encode Each Variable:

    • Purpose: To iterate over each variable and apply ordinal encoding based on the provided order.
    for variable, order in ordinal_map.items():
       if variable in categorical_variables:
           # Prepare data for OrdinalEncoder
           data = transformed_df[[variable]].apply(lambda x: pd.Categorical(x, categories=order, ordered=True))
           transformed_df[variable] = data.apply(lambda x: x.cat.codes)
           # Keep track of the newly encoded columns
           encoded_columns = pd.concat([encoded_columns, transformed_df[[variable]]], axis=1)
           print(f"✔ '{variable}' encoded based on the specified order: {order}\n")
       else:
           print(f"⚠️ '{variable}' specified in `ordinal_map` was not found in `categorical_variables` and has been skipped.\n")
  4. Output Results:

    • Purpose: To display the results and perform a sanity check.
    print(f"✔ New transformed dataframe:\n{transformed_df.head()}\n")
    print(f"✔ Dataframe with only the ordinal encoded columns:\n{encoded_columns.head()}\n")
    print("☻ HOW TO - To catch the df's use: `transformed_df, encoded_columns = transform_cat(your_df, your_columns, method='encode_ordinal', ordinal_map=your_ordinal_map)`.\n")
    print("< SANITY CHECK >")
    print(f"  ➡ Original dataframe shape: {df.shape}")
    print(f"  ➡ Transformed dataframe shape: {transformed_df.shape}\n")
    return transformed_df, encoded_columns

See the Full Function:

You can refer to the complete implementation of the transform_cat() function, including the 'encode_ordinal' method, on GitHub: transform_cat().