Implement error handling for evaluate_normality()

Implementation Summary

evaluate_normality() performs rigorous error handling to ensure the input data and parameters are valid for conducting normality tests. It provides clear error messages to guide the user in rectifying common input mistakes, enhancing the robustness and usability of the function.

Detailed Error Handling Breakdown

Data Validation

DataFrame Validation
- Ensures that the input df is a pandas DataFrame. This check is crucial because the function operations are designed specifically for DataFrame manipulations.

if not isinstance(df, pd.DataFrame):
    raise TypeError("evaluate_normality(): The 'df' parameter must be a pandas DataFrame.")

Column Existence Validation
- Verifies that the specified target_variable and grouping_variable exist within the DataFrame. This prevents runtime errors that would occur when trying to access non-existent DataFrame columns.

if target_variable not in df.columns:
    raise ValueError(f"evaluate_normality(): The target variable '{target_variable}' was not found in the DataFrame.")
if grouping_variable not in df.columns:
    raise ValueError(f"evaluate_normality(): The grouping variable '{grouping_variable}' was not found in the DataFrame.")

Parameter Type Validation

String Validation
- Checks that target_variable, grouping_variable, and method are strings, which is necessary for correct function operation, particularly in referencing DataFrame columns and selecting the method of normality testing.

if not isinstance(target_variable, str) or not isinstance(grouping_variable, str):
    raise TypeError("evaluate_normality(): The 'target_variable' and 'grouping_variable' parameters must be strings.")
if not isinstance(method, str):
    raise TypeError("evaluate_normality(): The 'method' parameter must be a string.")

Boolean Validation
- Confirms that pipeline is a boolean value, affecting the return type of the function (either detailed test results or a simple boolean indicator of normality).

if not isinstance(pipeline, bool):
    raise TypeError("evaluate_normality(): The 'pipeline' parameter must be a boolean.")

Content Validation

DataFrame Emptiness Check
- Ensures that the DataFrame is not empty, which is essential for performing any meaningful statistical tests.

if df.empty:
    raise ValueError("evaluate_normality(): The input DataFrame is empty.")

Variable Type Appropriateness
- Confirms that the target_variable is numerical and the grouping_variable is categorical, as these are prerequisites for the types of tests being performed.

if not evaluate_dtype(df, [target_variable], output='list_n')[0]:
    raise ValueError(f"evaluate_normality(): The target variable '{target_variable}' must be a numerical variable.")
if not evaluate_dtype(df, [grouping_variable], output='list_c')[0]:
    raise ValueError(f"evaluate_normality(): The grouping variable '{grouping_variable}' must be a categorical variable.")

Method Support Check
- Validates that the specified method for testing normality is one of the supported methods. This prevents errors related to attempting unsupported or nonexistent tests.

allowed_methods = ['shapiro', 'anderson', 'normaltest', 'lilliefors', 'consensus']
if method not in allowed_methods:
    raise ValueError(f"evaluate_normality(): The method '{method}' is not supported. Allowed methods are: {', '.join(allowed_methods)}.")

ETA444 / datasafari

Implement error handling for evaluate_normality() #80

Implementation Summary

Detailed Error Handling Breakdown

Data Validation

Parameter Type Validation

Content Validation

Link to Full Code