The transform_cat() function is designed to apply various transformations to categorical data within a DataFrame, such as uniform transformations, encoding to ordinal or one-hot formats, and target encoding based on another variable. The provided tests ensure the function behaves as expected across different scenarios and handles errors properly.
Detailed Breakdown of Tests
Error-Handling Tests
These tests confirm that transform_cat() properly manages various types of input errors:
Non-DataFrame Input:
Ensures a TypeError is raised if the input is not a DataFrame.
def test_invalid_df_type():
with pytest.raises(TypeError, match="The 'df' parameter must be a pandas DataFrame."):
transform_cat("not_a_dataframe", ['Category'], method='uniform_simple')
Non-List Categorical Variables:
Tests for a TypeError if categorical_variables is not provided as a list.
def test_invalid_categorical_variables_type(sample_data):
with pytest.raises(TypeError, match="The 'categorical_variables' parameter must be a list of column names."):
transform_cat(sample_data, 'Category', method='uniform_simple')
Empty DataFrame:
Checks for a ValueError when the DataFrame is empty.
def test_empty_dataframe():
with pytest.raises(ValueError, match="The input DataFrame is empty."):
transform_cat(pd.DataFrame(), ['Category'], method='uniform_simple')
Nonexistent Categorical Variable:
Verifies that a ValueError is raised if the specified categorical variable does not exist in the DataFrame.
def test_nonexistent_categorical_variable(sample_data):
with pytest.raises(ValueError, match="The following variables were not found in the DataFrame:"):
transform_cat(sample_data, ['Nonexistent'], method='uniform_simple')
Invalid Method Input:
Ensures that an invalid method name raises a ValueError.
def test_invalid_method(sample_data):
with pytest.raises(ValueError, match="Invalid method 'not_a_valid_method'"):
transform_cat(sample_data, ['Category'], method='not_a_valid_method')
Missing Maps for Specific Methods:
Checks for errors when necessary mappings are not provided for specific transformation methods.
def test_missing_abbreviation_map(sample_data):
with pytest.raises(ValueError, match="The 'abbreviation_map' parameter must be provided when using the 'uniform_mapping' method."):
transform_cat(sample_data, ['Category'], method='uniform_mapping')
Non-Dictionary Map Input:
Tests for TypeError if the map for a transformation is not provided as a dictionary.
def test_invalid_abbreviation_map_type(sample_data):
with pytest.raises(TypeError, match="The 'abbreviation_map' parameter must be a dictionary if provided."):
transform_cat(sample_data, ['Category'], method='uniform_mapping', abbreviation_map='not_a_dict')
Missing Target Variable for Target Encoding:
Verifies that a ValueError is raised if the target_variable is missing when required.
def test_missing_target_variable_for_target_encoding(sample_data):
with pytest.raises(ValueError, match="The 'target_variable' parameter must be provided when using the 'encode_target' method."):
transform_cat(sample_data, ['Category'], method='encode_target')
Functionality Tests
These tests confirm that each method performs the intended transformation correctly:
Uniform Simple Transformation:
Checks if the basic transformation is applied correctly, ensuring no NaN values and all entries are lowercase.
Tests whether categories are correctly encoded into binary columns.
def test_transform_cat_encode_onehot(sample_data):
transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='encode_onehot')
expected_columns = ['Category_A', 'Category_B', 'Category_C']
for col in expected_columns:
assert col in transformed_columns.columns
Target Encoding:
Verifies that categories are replaced by the mean of the target variable.
Summary of Unit Tests for
transform_cat()
The
transform_cat()
function is designed to apply various transformations to categorical data within a DataFrame, such as uniform transformations, encoding to ordinal or one-hot formats, and target encoding based on another variable. The provided tests ensure the function behaves as expected across different scenarios and handles errors properly.Detailed Breakdown of Tests
Error-Handling Tests
These tests confirm that
transform_cat()
properly manages various types of input errors:Non-DataFrame Input:
TypeError
is raised if the input is not a DataFrame.Non-List Categorical Variables:
TypeError
ifcategorical_variables
is not provided as a list.Empty DataFrame:
ValueError
when the DataFrame is empty.Nonexistent Categorical Variable:
ValueError
is raised if the specified categorical variable does not exist in the DataFrame.Invalid Method Input:
ValueError
.Missing Maps for Specific Methods:
Non-Dictionary Map Input:
TypeError
if the map for a transformation is not provided as a dictionary.Missing Target Variable for Target Encoding:
ValueError
is raised if thetarget_variable
is missing when required.Functionality Tests
These tests confirm that each method performs the intended transformation correctly:
Uniform Simple Transformation:
NaN
values and all entries are lowercase.One-Hot Encoding:
Target Encoding:
See the full test suite here.