Construct tests for evaluate_normality()

Summary of Unit Tests for `evaluate_normality()`

The evaluate_normality() function performs statistical tests to assess the normality of numerical data within a DataFrame, optionally grouped by a categorical variable. The tests are designed to validate both the functionality and error-handling capabilities of the function, ensuring it responds correctly under various input conditions.

Detailed Breakdown of Tests

Error-Handling Tests

Non-DataFrame Input:
- Checks if a TypeError is raised when the input is not a DataFrame.
Invalid Target Variable Type:
- Verifies a TypeError is raised when the target variable is not a string.
Invalid Grouping Variable Type:
- Ensures a TypeError is raised for a non-string grouping variable.
Invalid Method Type:
- Checks for a TypeError when the method is not a string.
Invalid Pipeline Type:
- Ensures a TypeError is raised when the pipeline flag is not a boolean.
Empty DataFrame:
- Tests that a ValueError is raised for an empty DataFrame.
Missing Target Variable:
- Verifies handling of a missing target variable within the DataFrame.
Missing Grouping Variable:
- Checks for a ValueError when the grouping variable is missing.
Non-Numerical Target Variable:
- Ensures that a ValueError is raised for non-numerical target variables.
Non-Categorical Grouping Variable:
- Tests that a ValueError is raised for non-categorical grouping variables.
Invalid Method Specification:
- Checks for a ValueError when an unknown method is specified.

Functionality Tests

Normality Consensus Method:
- Tests the consensus method for evaluating normality across multiple tests.
Specific Method - Shapiro:
- Verifies the Shapiro-Wilk test's functionality within grouped data.
Specific Method - Anderson:
- Tests the Anderson-Darling test's application to grouped data.
Specific Method - Normaltest:
- Assesses D'Agostino's K^2 normality test across different groups.
Specific Method - Lilliefors:
- Evaluates the Lilliefors test for grouped data normality assessment.
Pipeline Mode Operation:
- Tests that the pipeline mode returns a simple boolean indicating overall normality.
Grouping by Categorical Variable:
- Ensures normality tests are correctly applied to different categorical groups.
Method Output Differences:
- Compares outputs based on the specified method to ensure correctness and completeness.

Example Code from the Suite

Here's an example test code snippet for the "Normality Consensus Method":

def test_normality_consensus_method(sample_normality_df):
    """Test the consensus method to evaluate normality across all methods."""
    results = evaluate_normality(sample_normality_df, 'NumericData', 'Group', method='consensus', pipeline=False)
    assert isinstance(results, dict)
    assert 'shapiro' in results
    assert 'anderson' in results
    assert 'normaltest' in results
    assert 'lilliefors' in results

This test checks if the consensus method correctly integrates multiple normality tests and returns a dictionary of results, with keys for each normality test used.

Full Test Suite Access

For a comprehensive view and to explore more about the tests, you can access the full test suite here: Evaluate Normality Test Suite.

ETA444 / datasafari