The explore_cat() function is designed to analyze categorical data within a pandas DataFrame by calculating various statistics such as frequency counts, percentages, unique values, and entropy. The unit tests for this function are crafted to ensure it handles different input scenarios effectively and returns accurate categorical insights.
Detailed Breakdown of Tests
Error-Handling Tests
These tests verify the function's ability to correctly handle incorrect inputs and raise appropriate errors:
Empty DataFrame Input
Tests the function's response to an empty DataFrame, which should raise a ValueError.
def test_empty_dataframe():
df = pd.DataFrame()
with pytest.raises(ValueError, match="The input DataFrame is empty."):
explore_cat(df, ['Category'])
Non-DataFrame Input
Ensures that passing a non-DataFrame object raises a TypeError.
def test_non_dataframe_input():
with pytest.raises(TypeError, match="The df parameter must be a pandas DataFrame."):
explore_cat("not a dataframe", ['Department'])
Non-existent Column
Checks for a ValueError when provided column names do not exist in the DataFrame.
def test_non_existent_column(complex_categorical_df):
with pytest.raises(ValueError):
explore_cat(complex_categorical_df, ['NonExistentColumn'])
Non-categorical Column
Validates that an error is raised when non-categorical columns are specified.
def test_non_categorical_column(complex_categorical_df):
with pytest.raises(ValueError, match="The 'categorical_variables' list must contain only names of categorical variables."):
explore_cat(complex_categorical_df, ['Age', 'Department'])
Empty Categorical Variables List
Confirms that an empty list for categorical variables triggers a ValueError.
def test_empty_categorical_variables_list(complex_categorical_df):
with pytest.raises(ValueError, match="The 'categorical_variables' list must contain at least one column name."):
explore_cat(complex_categorical_df, [])
Non-list Categorical Variables
Ensures that providing categorical variables as a non-list type raises a TypeError.
def test_non_list_categorical_variables(complex_categorical_df):
with pytest.raises(TypeError, match="The categorical_variables parameter must be a list of variable names."):
explore_cat(complex_categorical_df, 'Department')
Non-string in Categorical Variables
Checks for a TypeError when elements in the categorical variables list are not strings.
def test_non_string_in_categorical_variables(complex_categorical_df):
with pytest.raises(TypeError, match="All items in the categorical_variables list must be strings representing column names."):
explore_cat(complex_categorical_df, [123, 'Department'])
Invalid Method
Verifies that an invalid method name results in a ValueError.
def test_invalid_method(complex_categorical_df):
with pytest.raises(ValueError):
explore_cat(complex_categorical_df, ['Department'], method='invalid_method')
Invalid Output Option
Ensures that using an unrecognized output option raises a ValueError.
def test_invalid_output_option(complex_categorical_df):
with pytest.raises(ValueError, match="Invalid output method. Choose 'print' or 'return'."):
explore_cat(complex_categorical_df, ['Department'], output='invalid_output')
Functionality Tests
These tests confirm that the function performs its intended operations correctly:
Entropy Method
Confirms that entropy calculations are performed and formatted properly.
def test_entropy_method(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='entropy', output='return')
assert 'Entropy' in result
assert '1.585' in result
Counts and Percentage Output
Tests the correct formatting and calculation of counts and percentages.
def test_counts_percentage_output(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='counts_percentage', output='return')
assert 'Counts' in result and 'Percentages' in result
assert 'HR' in result and 'Tech' in result and 'Admin' in result
Unique Values Method
Verifies that unique values are listed accurately in the output.
def test_unique_values_method(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='unique_values', output='return')
assert 'HR' in result and 'Tech' in result and 'Admin' in result
All Methods Integration
Ensures that the 'all' method integrates outputs from all individual methods effectively.
def test_all_methods(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='all', output='return')
assert 'UNIQUE VALUES' in result
assert 'COUNTS & PERCENTAGE' in result
assert 'ENTROPY' in result
Summary of Unit Tests for
explore_cat()
The
explore_cat()
function is designed to analyze categorical data within a pandas DataFrame by calculating various statistics such as frequency counts, percentages, unique values, and entropy. The unit tests for this function are crafted to ensure it handles different input scenarios effectively and returns accurate categorical insights.Detailed Breakdown of Tests
Error-Handling Tests
These tests verify the function's ability to correctly handle incorrect inputs and raise appropriate errors:
Empty DataFrame Input
ValueError
.Non-DataFrame Input
TypeError
.Non-existent Column
ValueError
when provided column names do not exist in the DataFrame.Non-categorical Column
Empty Categorical Variables List
ValueError
.Non-list Categorical Variables
TypeError
.Non-string in Categorical Variables
TypeError
when elements in the categorical variables list are not strings.Invalid Method
ValueError
.Invalid Output Option
ValueError
.Functionality Tests
These tests confirm that the function performs its intended operations correctly:
Entropy Method
Counts and Percentage Output
Unique Values Method
All Methods Integration