ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Construct tests for transform_num() #41

Closed ETA444 closed 6 months ago

ETA444 commented 6 months ago

Summary of Unit Tests for transform_num()

The transform_num() function is designed to perform various transformations on numerical data within a DataFrame, such as standardization, normalization, quantile transformation, and more. The tests aim to ensure the function can handle different types of data inputs correctly and respond appropriately to erroneous situations.

Detailed Breakdown of Tests*

* The full test suite can be accessed here.

Error-Handling Tests

These tests are designed to ensure that transform_num() correctly identifies and handles incorrect inputs and scenarios:

  1. Non-DataFrame Input:

    • Ensures a TypeError is raised when the input is not a DataFrame.
      def test_transform_num_invalid_df_type():
      with pytest.raises(TypeError):
         transform_num("not_a_dataframe", ['Feature1'], 'standardize')
  2. Non-List Numerical Variables:

    • Checks for a TypeError if numerical_variables is not provided as a list.
      def test_transform_num_invalid_numerical_variables_type():
      with pytest.raises(TypeError):
         transform_num(pd.DataFrame(), 'Feature1', 'standardize')
  3. Invalid Numerical Variable Entries:

    • Verifies that a TypeError is raised for non-string entries in the numerical_variables list.
      def test_transform_num_invalid_numerical_variable_entries(sample_data):
      with pytest.raises(TypeError):
         transform_num(sample_data, [1, 2, 3], 'standardize')
  4. Empty DataFrame:

    • Ensures a ValueError is raised when the DataFrame is empty.
      def test_transform_num_empty_dataframe():
      with pytest.raises(ValueError):
         transform_num(pd.DataFrame(), ['Feature1'], 'standardize')
  5. Nonexistent Numerical Variable:

    • Confirms a ValueError is raised if the specified numerical variable does not exist in the DataFrame.
      def test_transform_num_nonexistent_numerical_variable(sample_data):
      with pytest.raises(ValueError):
         transform_num(sample_data, ['Nonexistent'], 'standardize')
  6. Invalid Quantile Parameters:

    • Tests for errors when quantile-related parameters are incorrect.
      def test_transform_num_invalid_quantile_parameters(sample_data):
      with pytest.raises(TypeError):
         transform_num(sample_data, ['Feature1'], 'quantile', n_quantiles='1000')
      with pytest.raises(ValueError):
         transform_num(sample_data, ['Feature1'], 'quantile', n_quantiles=-1000)
  7. Invalid Interaction Pairs Format:

    • Checks for a TypeError when the format of interaction pairs is incorrect.
      def test_transform_num_invalid_interaction_pairs(sample_data):
      with pytest.raises(TypeError):
         transform_num(sample_data, ['Feature1', 'Feature2'], 'interaction', interaction_pairs='not_a_list')

Functionality Tests

These tests ensure that each transformation method is applied correctly and yields expected results:

  1. Standardize:

    • Confirms that data is standardized correctly, with a mean of 0 and standard deviation of 1.
      def test_transform_num_standardize(sample_data):
      transformed_df, transformed_columns = transform_num(sample_data, ['Feature1', 'Feature2'], 'standardize')
      assert transformed_columns.mean().round(1).all() == 0
      assert transformed_columns.std().round(1).all() == 1
  2. Log Transformation:

    • Tests logarithmic transformation, ensuring all values are non-negative.
      def test_transform_num_log(sample_data):
      sample_data['Feature2'] += 1  # Ensure positive values
      transformed_df, transformed_columns = transform_num(sample_data, ['Feature2'], 'log')
      assert (transformed_columns >= 0).all().all()
  3. Normalize:

    • Checks normalization, ensuring the minimum is 0 and the maximum is 1.
      def test_transform_num_normalize(sample_data):
      transformed_df, transformed_columns = transform_num(sample_data, ['Feature1'], 'normalize')
      assert transformed_columns.min().all() == 0
      assert transformed_columns.max().all() == 1
  4. Quantile Transformation:

    • Verifies the correct application of quantile transformations to specified distributions.
      def test_transform_num_quantile_normal(sample_data):
      transformed_df, transformed_columns = transform_num(sample_data, ['Feature2'], 'quantile', output_distribution='normal')
      assert (transformed_columns.mean().round(1).all() == 0) and (transformed_columns.std().round(1).all() == 1)
  5. Robust Scaling:

    • Tests robust scaling using IQR, ensuring the median is centred at 0.
      def test_transform_num_robust(sample_data):
      transformed_df, transformed_columns = transform_num(sample_data, ['Feature1', 'Feature3'], 'robust', quantile_range=(25.0, 75.0))
      assert transformed_columns.median().all() == 0