ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Construct tests for transform_cat() #40

Closed ETA444 closed 6 months ago

ETA444 commented 6 months ago

Summary of Unit Tests for transform_cat()

The transform_cat() function is designed to apply various transformations to categorical data within a DataFrame, such as uniform transformations, encoding to ordinal or one-hot formats, and target encoding based on another variable. The provided tests ensure the function behaves as expected across different scenarios and handles errors properly.

Detailed Breakdown of Tests

Error-Handling Tests

These tests confirm that transform_cat() properly manages various types of input errors:

  1. Non-DataFrame Input:

    • Ensures a TypeError is raised if the input is not a DataFrame.
      def test_invalid_df_type():
      with pytest.raises(TypeError, match="The 'df' parameter must be a pandas DataFrame."):
         transform_cat("not_a_dataframe", ['Category'], method='uniform_simple')
  2. Non-List Categorical Variables:

    • Tests for a TypeError if categorical_variables is not provided as a list.
      def test_invalid_categorical_variables_type(sample_data):
      with pytest.raises(TypeError, match="The 'categorical_variables' parameter must be a list of column names."):
         transform_cat(sample_data, 'Category', method='uniform_simple')
  3. Empty DataFrame:

    • Checks for a ValueError when the DataFrame is empty.
      def test_empty_dataframe():
      with pytest.raises(ValueError, match="The input DataFrame is empty."):
         transform_cat(pd.DataFrame(), ['Category'], method='uniform_simple')
  4. Nonexistent Categorical Variable:

    • Verifies that a ValueError is raised if the specified categorical variable does not exist in the DataFrame.
      def test_nonexistent_categorical_variable(sample_data):
      with pytest.raises(ValueError, match="The following variables were not found in the DataFrame:"):
         transform_cat(sample_data, ['Nonexistent'], method='uniform_simple')
  5. Invalid Method Input:

    • Ensures that an invalid method name raises a ValueError.
      def test_invalid_method(sample_data):
      with pytest.raises(ValueError, match="Invalid method 'not_a_valid_method'"):
         transform_cat(sample_data, ['Category'], method='not_a_valid_method')
  6. Missing Maps for Specific Methods:

    • Checks for errors when necessary mappings are not provided for specific transformation methods.
      def test_missing_abbreviation_map(sample_data):
      with pytest.raises(ValueError, match="The 'abbreviation_map' parameter must be provided when using the 'uniform_mapping' method."):
         transform_cat(sample_data, ['Category'], method='uniform_mapping')
  7. Non-Dictionary Map Input:

    • Tests for TypeError if the map for a transformation is not provided as a dictionary.
      def test_invalid_abbreviation_map_type(sample_data):
      with pytest.raises(TypeError, match="The 'abbreviation_map' parameter must be a dictionary if provided."):
         transform_cat(sample_data, ['Category'], method='uniform_mapping', abbreviation_map='not_a_dict')
  8. Missing Target Variable for Target Encoding:

    • Verifies that a ValueError is raised if the target_variable is missing when required.
      def test_missing_target_variable_for_target_encoding(sample_data):
      with pytest.raises(ValueError, match="The 'target_variable' parameter must be provided when using the 'encode_target' method."):
         transform_cat(sample_data, ['Category'], method='encode_target')

Functionality Tests

These tests confirm that each method performs the intended transformation correctly:

  1. Uniform Simple Transformation:

    • Checks if the basic transformation is applied correctly, ensuring no NaN values and all entries are lowercase.
      def test_transform_cat_uniform_simple(sample_data):
      transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='uniform_simple')
      assert 'Category' in transformed_df.columns
      assert transformed_df['Category'].isna().sum() == 0
      assert transformed_df['Category'].str.islower().all()
  2. One-Hot Encoding:

    • Tests whether categories are correctly encoded into binary columns.
      def test_transform_cat_encode_onehot(sample_data):
      transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='encode_onehot')
      expected_columns = ['Category_A', 'Category_B', 'Category_C']
      for col in expected_columns:
         assert col in transformed_columns.columns
  3. Target Encoding:

    • Verifies that categories are replaced by the mean of the target variable.
      def test_transform_cat_encode_target(sample_data):
      transformed_df, transformed_columns = transform_cat(sample_data, ['Category'], method='encode_target', target_variable='Target')
      assert transformed_df['Category'].dtype == float

See the full test suite here.