The evaluate_normality() function performs statistical tests to assess the normality of numerical data within a DataFrame, optionally grouped by a categorical variable. The tests are designed to validate both the functionality and error-handling capabilities of the function, ensuring it responds correctly under various input conditions.
Detailed Breakdown of Tests
Error-Handling Tests
Non-DataFrame Input:
Checks if a TypeError is raised when the input is not a DataFrame.
Invalid Target Variable Type:
Verifies a TypeError is raised when the target variable is not a string.
Invalid Grouping Variable Type:
Ensures a TypeError is raised for a non-string grouping variable.
Invalid Method Type:
Checks for a TypeError when the method is not a string.
Invalid Pipeline Type:
Ensures a TypeError is raised when the pipeline flag is not a boolean.
Empty DataFrame:
Tests that a ValueError is raised for an empty DataFrame.
Missing Target Variable:
Verifies handling of a missing target variable within the DataFrame.
Missing Grouping Variable:
Checks for a ValueError when the grouping variable is missing.
Non-Numerical Target Variable:
Ensures that a ValueError is raised for non-numerical target variables.
Non-Categorical Grouping Variable:
Tests that a ValueError is raised for non-categorical grouping variables.
Invalid Method Specification:
Checks for a ValueError when an unknown method is specified.
Functionality Tests
Normality Consensus Method:
Tests the consensus method for evaluating normality across multiple tests.
Specific Method - Shapiro:
Verifies the Shapiro-Wilk test's functionality within grouped data.
Specific Method - Anderson:
Tests the Anderson-Darling test's application to grouped data.
Specific Method - Normaltest:
Assesses D'Agostino's K^2 normality test across different groups.
Specific Method - Lilliefors:
Evaluates the Lilliefors test for grouped data normality assessment.
Pipeline Mode Operation:
Tests that the pipeline mode returns a simple boolean indicating overall normality.
Grouping by Categorical Variable:
Ensures normality tests are correctly applied to different categorical groups.
Method Output Differences:
Compares outputs based on the specified method to ensure correctness and completeness.
Example Code from the Suite
Here's an example test code snippet for the "Normality Consensus Method":
def test_normality_consensus_method(sample_normality_df):
"""Test the consensus method to evaluate normality across all methods."""
results = evaluate_normality(sample_normality_df, 'NumericData', 'Group', method='consensus', pipeline=False)
assert isinstance(results, dict)
assert 'shapiro' in results
assert 'anderson' in results
assert 'normaltest' in results
assert 'lilliefors' in results
This test checks if the consensus method correctly integrates multiple normality tests and returns a dictionary of results, with keys for each normality test used.
Full Test Suite Access
For a comprehensive view and to explore more about the tests, you can access the full test suite here: Evaluate Normality Test Suite.
Summary of Unit Tests for
evaluate_normality()
The
evaluate_normality()
function performs statistical tests to assess the normality of numerical data within a DataFrame, optionally grouped by a categorical variable. The tests are designed to validate both the functionality and error-handling capabilities of the function, ensuring it responds correctly under various input conditions.Detailed Breakdown of Tests
Error-Handling Tests
Non-DataFrame Input:
TypeError
is raised when the input is not a DataFrame.Invalid Target Variable Type:
TypeError
is raised when the target variable is not a string.Invalid Grouping Variable Type:
TypeError
is raised for a non-string grouping variable.Invalid Method Type:
TypeError
when the method is not a string.Invalid Pipeline Type:
TypeError
is raised when the pipeline flag is not a boolean.Empty DataFrame:
ValueError
is raised for an empty DataFrame.Missing Target Variable:
Missing Grouping Variable:
ValueError
when the grouping variable is missing.Non-Numerical Target Variable:
ValueError
is raised for non-numerical target variables.Non-Categorical Grouping Variable:
ValueError
is raised for non-categorical grouping variables.Invalid Method Specification:
ValueError
when an unknown method is specified.Functionality Tests
Normality Consensus Method:
Specific Method - Shapiro:
Specific Method - Anderson:
Specific Method - Normaltest:
Specific Method - Lilliefors:
Pipeline Mode Operation:
Grouping by Categorical Variable:
Method Output Differences:
Example Code from the Suite
Here's an example test code snippet for the "Normality Consensus Method":
This test checks if the consensus method correctly integrates multiple normality tests and returns a dictionary of results, with keys for each normality test used.
Full Test Suite Access
For a comprehensive view and to explore more about the tests, you can access the full test suite here: Evaluate Normality Test Suite.