Construct tests for transform_cat()

Summary of Unit Tests for `transform_cat()`

The transform_cat() function is designed to apply various transformations to categorical data within a DataFrame, such as uniform transformations, encoding to ordinal or one-hot formats, and target encoding based on another variable. The provided tests ensure the function behaves as expected across different scenarios and handles errors properly.

Detailed Breakdown of Tests

Error-Handling Tests

These tests confirm that transform_cat() properly manages various types of input errors:

Non-DataFrame Input:

Ensures a TypeError is raised if the input is not a DataFrame.

def test_invalid_df_type():
with pytest.raises(TypeError, match="The 'df' parameter must be a pandas DataFrame."):
   transform_cat("not_a_dataframe", ['Category'], method='uniform_simple')

Non-List Categorical Variables:

Tests for a TypeError if categorical_variables is not provided as a list.

def test_invalid_categorical_variables_type(sample_data):
with pytest.raises(TypeError, match="The 'categorical_variables' parameter must be a list of column names."):
   transform_cat(sample_data, 'Category', method='uniform_simple')

Empty DataFrame:

Checks for a ValueError when the DataFrame is empty.

def test_empty_dataframe():
with pytest.raises(ValueError, match="The input DataFrame is empty."):
   transform_cat(pd.DataFrame(), ['Category'], method='uniform_simple')

Nonexistent Categorical Variable:

Verifies that a ValueError is raised if the specified categorical variable does not exist in the DataFrame.

def test_nonexistent_categorical_variable(sample_data):
with pytest.raises(ValueError, match="The following variables were not found in the DataFrame:"):
   transform_cat(sample_data, ['Nonexistent'], method='uniform_simple')

Invalid Method Input:

Ensures that an invalid method name raises a ValueError.

def test_invalid_method(sample_data):
with pytest.raises(ValueError, match="Invalid method 'not_a_valid_method'"):
   transform_cat(sample_data, ['Category'], method='not_a_valid_method')

Missing Maps for Specific Methods:

Checks for errors when necessary mappings are not provided for specific transformation methods.

def test_missing_abbreviation_map(sample_data):
with pytest.raises(ValueError, match="The 'abbreviation_map' parameter must be provided when using the 'uniform_mapping' method."):
   transform_cat(sample_data, ['Category'], method='uniform_mapping')

Non-Dictionary Map Input:

Tests for TypeError if the map for a transformation is not provided as a dictionary.

def test_invalid_abbreviation_map_type(sample_data):
with pytest.raises(TypeError, match="The 'abbreviation_map' parameter must be a dictionary if provided."):
   transform_cat(sample_data, ['Category'], method='uniform_mapping', abbreviation_map='not_a_dict')

Missing Target Variable for Target Encoding:

Verifies that a ValueError is raised if the target_variable is missing when required.

def test_missing_target_variable_for_target_encoding(sample_data):
with pytest.raises(ValueError, match="The 'target_variable' parameter must be provided when using the 'encode_target' method."):
   transform_cat(sample_data, ['Category'], method='encode_target')

Functionality Tests

These tests confirm that each method performs the intended transformation correctly:

Uniform Simple Transformation:

Checks if the basic transformation is applied correctly, ensuring no NaN values and all entries are lowercase.

def test_transform_cat_uniform_simple(sample_data):
transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='uniform_simple')
assert 'Category' in transformed_df.columns
assert transformed_df['Category'].isna().sum() == 0
assert transformed_df['Category'].str.islower().all()

One-Hot Encoding:

Tests whether categories are correctly encoded into binary columns.

def test_transform_cat_encode_onehot(sample_data):
transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='encode_onehot')
expected_columns = ['Category_A', 'Category_B', 'Category_C']
for col in expected_columns:
   assert col in transformed_columns.columns

Target Encoding:

Verifies that categories are replaced by the mean of the target variable.

def test_transform_cat_encode_target(sample_data):
transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='encode_target', target_variable='Target')
assert transformed_df['Category'].dtype == float

ETA444 / datasafari