evaluate_normality() performs rigorous error handling to ensure the input data and parameters are valid for conducting normality tests. It provides clear error messages to guide the user in rectifying common input mistakes, enhancing the robustness and usability of the function.
Detailed Error Handling Breakdown
Data Validation
DataFrame Validation
Ensures that the input df is a pandas DataFrame. This check is crucial because the function operations are designed specifically for DataFrame manipulations.
if not isinstance(df, pd.DataFrame):
raise TypeError("evaluate_normality(): The 'df' parameter must be a pandas DataFrame.")
Column Existence Validation
Verifies that the specified target_variable and grouping_variable exist within the DataFrame. This prevents runtime errors that would occur when trying to access non-existent DataFrame columns.
if target_variable not in df.columns:
raise ValueError(f"evaluate_normality(): The target variable '{target_variable}' was not found in the DataFrame.")
if grouping_variable not in df.columns:
raise ValueError(f"evaluate_normality(): The grouping variable '{grouping_variable}' was not found in the DataFrame.")
Parameter Type Validation
String Validation
Checks that target_variable, grouping_variable, and method are strings, which is necessary for correct function operation, particularly in referencing DataFrame columns and selecting the method of normality testing.
if not isinstance(target_variable, str) or not isinstance(grouping_variable, str):
raise TypeError("evaluate_normality(): The 'target_variable' and 'grouping_variable' parameters must be strings.")
if not isinstance(method, str):
raise TypeError("evaluate_normality(): The 'method' parameter must be a string.")
Boolean Validation
Confirms that pipeline is a boolean value, affecting the return type of the function (either detailed test results or a simple boolean indicator of normality).
if not isinstance(pipeline, bool):
raise TypeError("evaluate_normality(): The 'pipeline' parameter must be a boolean.")
Content Validation
DataFrame Emptiness Check
Ensures that the DataFrame is not empty, which is essential for performing any meaningful statistical tests.
if df.empty:
raise ValueError("evaluate_normality(): The input DataFrame is empty.")
Variable Type Appropriateness
Confirms that the target_variable is numerical and the grouping_variable is categorical, as these are prerequisites for the types of tests being performed.
if not evaluate_dtype(df, [target_variable], output='list_n')[0]:
raise ValueError(f"evaluate_normality(): The target variable '{target_variable}' must be a numerical variable.")
if not evaluate_dtype(df, [grouping_variable], output='list_c')[0]:
raise ValueError(f"evaluate_normality(): The grouping variable '{grouping_variable}' must be a categorical variable.")
Method Support Check
Validates that the specified method for testing normality is one of the supported methods. This prevents errors related to attempting unsupported or nonexistent tests.
allowed_methods = ['shapiro', 'anderson', 'normaltest', 'lilliefors', 'consensus']
if method not in allowed_methods:
raise ValueError(f"evaluate_normality(): The method '{method}' is not supported. Allowed methods are: {', '.join(allowed_methods)}.")
Implementation Summary
evaluate_normality()
performs rigorous error handling to ensure the input data and parameters are valid for conducting normality tests. It provides clear error messages to guide the user in rectifying common input mistakes, enhancing the robustness and usability of the function.Detailed Error Handling Breakdown
Data Validation
df
is a pandas DataFrame. This check is crucial because the function operations are designed specifically for DataFrame manipulations.target_variable
andgrouping_variable
exist within the DataFrame. This prevents runtime errors that would occur when trying to access non-existent DataFrame columns.Parameter Type Validation
target_variable
,grouping_variable
, andmethod
are strings, which is necessary for correct function operation, particularly in referencing DataFrame columns and selecting the method of normality testing.pipeline
is a boolean value, affecting the return type of the function (either detailed test results or a simple boolean indicator of normality).Content Validation
target_variable
is numerical and thegrouping_variable
is categorical, as these are prerequisites for the types of tests being performed.Link to Full Code