ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Construct tests for explore_cat() #38

Closed ETA444 closed 6 months ago

ETA444 commented 6 months ago

Summary of Unit Tests for explore_cat()

The explore_cat() function is designed to analyze categorical data within a pandas DataFrame by calculating various statistics such as frequency counts, percentages, unique values, and entropy. The unit tests for this function are crafted to ensure it handles different input scenarios effectively and returns accurate categorical insights.

Detailed Breakdown of Tests

Error-Handling Tests

These tests verify the function's ability to correctly handle incorrect inputs and raise appropriate errors:

  1. Empty DataFrame Input

    • Tests the function's response to an empty DataFrame, which should raise a ValueError.
      def test_empty_dataframe():
      df = pd.DataFrame()
      with pytest.raises(ValueError, match="The input DataFrame is empty."):
         explore_cat(df, ['Category'])
  2. Non-DataFrame Input

    • Ensures that passing a non-DataFrame object raises a TypeError.
      def test_non_dataframe_input():
      with pytest.raises(TypeError, match="The df parameter must be a pandas DataFrame."):
         explore_cat("not a dataframe", ['Department'])
  3. Non-existent Column

    • Checks for a ValueError when provided column names do not exist in the DataFrame.
      def test_non_existent_column(complex_categorical_df):
      with pytest.raises(ValueError):
         explore_cat(complex_categorical_df, ['NonExistentColumn'])
  4. Non-categorical Column

    • Validates that an error is raised when non-categorical columns are specified.
      def test_non_categorical_column(complex_categorical_df):
      with pytest.raises(ValueError, match="The 'categorical_variables' list must contain only names of categorical variables."):
         explore_cat(complex_categorical_df, ['Age', 'Department'])
  5. Empty Categorical Variables List

    • Confirms that an empty list for categorical variables triggers a ValueError.
      def test_empty_categorical_variables_list(complex_categorical_df):
      with pytest.raises(ValueError, match="The 'categorical_variables' list must contain at least one column name."):
         explore_cat(complex_categorical_df, [])
  6. Non-list Categorical Variables

    • Ensures that providing categorical variables as a non-list type raises a TypeError.
      def test_non_list_categorical_variables(complex_categorical_df):
      with pytest.raises(TypeError, match="The categorical_variables parameter must be a list of variable names."):
         explore_cat(complex_categorical_df, 'Department')
  7. Non-string in Categorical Variables

    • Checks for a TypeError when elements in the categorical variables list are not strings.
      def test_non_string_in_categorical_variables(complex_categorical_df):
      with pytest.raises(TypeError, match="All items in the categorical_variables list must be strings representing column names."):
         explore_cat(complex_categorical_df, [123, 'Department'])
  8. Invalid Method

    • Verifies that an invalid method name results in a ValueError.
      def test_invalid_method(complex_categorical_df):
      with pytest.raises(ValueError):
         explore_cat(complex_categorical_df, ['Department'], method='invalid_method')
  9. Invalid Output Option

    • Ensures that using an unrecognized output option raises a ValueError.
      def test_invalid_output_option(complex_categorical_df):
      with pytest.raises(ValueError, match="Invalid output method. Choose 'print' or 'return'."):
         explore_cat(complex_categorical_df, ['Department'], output='invalid_output')

Functionality Tests

These tests confirm that the function performs its intended operations correctly:

  1. Entropy Method

    • Confirms that entropy calculations are performed and formatted properly.
      def test_entropy_method(complex_categorical_df):
      result = explore_cat(complex_categorical_df, ['Department'], method='entropy', output='return')
      assert 'Entropy' in result
      assert '1.585' in result
  2. Counts and Percentage Output

    • Tests the correct formatting and calculation of counts and percentages.
      def test_counts_percentage_output(complex_categorical_df):
      result = explore_cat(complex_categorical_df, ['Department'], method='counts_percentage', output='return')
      assert 'Counts' in result and 'Percentages' in result
      assert 'HR' in result and 'Tech' in result and 'Admin' in result
  3. Unique Values Method

    • Verifies that unique values are listed accurately in the output.
      def test_unique_values_method(complex_categorical_df):
      result = explore_cat(complex_categorical_df, ['Department'], method='unique_values', output='return')
      assert 'HR' in result and 'Tech' in result and 'Admin' in result
  4. All Methods Integration

    • Ensures that the 'all' method integrates outputs from all individual methods effectively.
      def test_all_methods(complex_categorical_df):
      result = explore_cat(complex_categorical_df, ['Department'], method='all', output='return')
      assert 'UNIQUE VALUES' in result
      assert 'COUNTS & PERCENTAGE' in result
      assert 'ENTROPY' in result