Construct tests for explore_cat()

Summary of Unit Tests for `explore_cat()`

The explore_cat() function is designed to analyze categorical data within a pandas DataFrame by calculating various statistics such as frequency counts, percentages, unique values, and entropy. The unit tests for this function are crafted to ensure it handles different input scenarios effectively and returns accurate categorical insights.

Detailed Breakdown of Tests

Error-Handling Tests

These tests verify the function's ability to correctly handle incorrect inputs and raise appropriate errors:

Empty DataFrame Input

Tests the function's response to an empty DataFrame, which should raise a ValueError.

def test_empty_dataframe():
df = pd.DataFrame()
with pytest.raises(ValueError, match="The input DataFrame is empty."):
   explore_cat(df, ['Category'])

Non-DataFrame Input

Ensures that passing a non-DataFrame object raises a TypeError.

def test_non_dataframe_input():
with pytest.raises(TypeError, match="The df parameter must be a pandas DataFrame."):
   explore_cat("not a dataframe", ['Department'])

Non-existent Column

Checks for a ValueError when provided column names do not exist in the DataFrame.

def test_non_existent_column(complex_categorical_df):
with pytest.raises(ValueError):
   explore_cat(complex_categorical_df, ['NonExistentColumn'])

Non-categorical Column

Validates that an error is raised when non-categorical columns are specified.

def test_non_categorical_column(complex_categorical_df):
with pytest.raises(ValueError, match="The 'categorical_variables' list must contain only names of categorical variables."):
   explore_cat(complex_categorical_df, ['Age', 'Department'])

Empty Categorical Variables List

Confirms that an empty list for categorical variables triggers a ValueError.

def test_empty_categorical_variables_list(complex_categorical_df):
with pytest.raises(ValueError, match="The 'categorical_variables' list must contain at least one column name."):
   explore_cat(complex_categorical_df, [])

Non-list Categorical Variables

Ensures that providing categorical variables as a non-list type raises a TypeError.

def test_non_list_categorical_variables(complex_categorical_df):
with pytest.raises(TypeError, match="The categorical_variables parameter must be a list of variable names."):
   explore_cat(complex_categorical_df, 'Department')

Non-string in Categorical Variables

Checks for a TypeError when elements in the categorical variables list are not strings.

def test_non_string_in_categorical_variables(complex_categorical_df):
with pytest.raises(TypeError, match="All items in the categorical_variables list must be strings representing column names."):
   explore_cat(complex_categorical_df, [123, 'Department'])

Invalid Method

Verifies that an invalid method name results in a ValueError.

def test_invalid_method(complex_categorical_df):
with pytest.raises(ValueError):
   explore_cat(complex_categorical_df, ['Department'], method='invalid_method')

Invalid Output Option

Ensures that using an unrecognized output option raises a ValueError.

def test_invalid_output_option(complex_categorical_df):
with pytest.raises(ValueError, match="Invalid output method. Choose 'print' or 'return'."):
   explore_cat(complex_categorical_df, ['Department'], output='invalid_output')

Functionality Tests

These tests confirm that the function performs its intended operations correctly:

Entropy Method

Confirms that entropy calculations are performed and formatted properly.

def test_entropy_method(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='entropy', output='return')
assert 'Entropy' in result
assert '1.585' in result

Counts and Percentage Output

Tests the correct formatting and calculation of counts and percentages.

def test_counts_percentage_output(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='counts_percentage', output='return')
assert 'Counts' in result and 'Percentages' in result
assert 'HR' in result and 'Tech' in result and 'Admin' in result

Unique Values Method

Verifies that unique values are listed accurately in the output.

def test_unique_values_method(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='unique_values', output='return')
assert 'HR' in result and 'Tech' in result and 'Admin' in result

All Methods Integration

Ensures that the 'all' method integrates outputs from all individual methods effectively.

def test_all_methods(complex_categorical_df):
result = explore_cat(complex_categorical_df, ['Department'], method='all', output='return')
assert 'UNIQUE VALUES' in result
assert 'COUNTS & PERCENTAGE' in result
assert 'ENTROPY' in result

ETA444 / datasafari