Summary of Unit Tests for data_preprocessing_core()
The data_preprocessing_core() function is crucial for preparing data for model training and evaluation in the predict_ml() machine learning workflow. It involves transforming data according to specified configurations, handling missing values, encoding categorical variables, and splitting the data into training and testing subsets. The unit tests are designed to ensure this function can handle various data types and configurations without errors and performs expected operations correctly.
Detailed Test Descriptions
Type Validation Tests:
Invalid DataFrame Type: Checks if a TypeError is raised when the input is not a pandas DataFrame.
Invalid x_cols Type: Verifies that a TypeError is raised when x_cols is not a list of strings.
Invalid y_col Type: Ensures a TypeError is raised when y_col is not a string.
Invalid data_state Type: Checks for a TypeError when data_state is not a string.
Invalid test_size Type: Asserts that a TypeError is raised for non-float test_size.
Invalid random_state Type: Tests for a TypeError when random_state is not an integer.
Invalid verbose Type: Verifies that a TypeError is raised when verbose is not an integer.
Value and Compatibility Checks:
Invalid test_size Value: Ensures a ValueError is raised if test_size is not between 0 and 1.
Empty DataFrame: Checks if a ValueError is raised when an empty DataFrame is processed.
Invalid Component Types: Tests various components (imputers, scalers, encoders) to ensure they comply with required interfaces like fit_transform.
Existence and Integrity Checks:
Missing y_col: Verifies a ValueError is raised if y_col is not found in the DataFrame.
Missing x_cols: Checks for a ValueError when specified x_cols are not found in the DataFrame.
Test Size Too Large: Ensures that a ValueError is raised if there isn’t enough data to satisfy the test_size requirement.
Example Test Code
Here's an example of how a type validation test is implemented:
def test_invalid_df_type(sample_data_dpc):
""" Test TypeError is raised when df is not a DataFrame. """
with pytest.raises(TypeError):
data_preprocessing_core("not_a_dataframe", ["Age"], "Salary", "unprocessed")
This test ensures that the function correctly identifies when the input df is not a pandas DataFrame and raises the appropriate TypeError.
Summary of Unit Tests for
data_preprocessing_core()
The
data_preprocessing_core()
function is crucial for preparing data for model training and evaluation in thepredict_ml()
machine learning workflow. It involves transforming data according to specified configurations, handling missing values, encoding categorical variables, and splitting the data into training and testing subsets. The unit tests are designed to ensure this function can handle various data types and configurations without errors and performs expected operations correctly.Detailed Test Descriptions
Type Validation Tests:
TypeError
is raised when the input is not a pandas DataFrame.TypeError
is raised whenx_cols
is not a list of strings.TypeError
is raised wheny_col
is not a string.TypeError
whendata_state
is not a string.TypeError
is raised for non-floattest_size
.TypeError
whenrandom_state
is not an integer.TypeError
is raised whenverbose
is not an integer.Value and Compatibility Checks:
ValueError
is raised iftest_size
is not between 0 and 1.ValueError
is raised when an empty DataFrame is processed.fit_transform
.Existence and Integrity Checks:
ValueError
is raised ify_col
is not found in the DataFrame.ValueError
when specifiedx_cols
are not found in the DataFrame.ValueError
is raised if there isn’t enough data to satisfy thetest_size
requirement.Example Test Code
Here's an example of how a type validation test is implemented:
This test ensures that the function correctly identifies when the input
df
is not a pandas DataFrame and raises the appropriateTypeError
.Full Test Suite
You can access the complete suite code at: Data Preprocessing Test Suite.