ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement error handling for hypothesis_predictor_core_n() #90

Closed ETA444 closed 4 months ago

ETA444 commented 4 months ago

Implementation Summary

The function hypothesis_predictor_core_n() is designed to conduct hypothesis tests for numerical data grouped by a categorical variable. It chooses between parametric and non-parametric tests based on the normality of the data and equality of variances across groups. The function supports a variety of tests including t-tests, Mann-Whitney U tests, ANOVA, and Kruskal-Wallis tests. Proper error handling ensures that the function operates smoothly by validating the types and values of all inputs.

Detailed Error Handling Breakdown

Type Validations

if not isinstance(df, pd.DataFrame):
    raise TypeError("predictor_core_numerical(): The 'df' parameter must be a pandas DataFrame.")
if not isinstance(target_variable, str):
    raise TypeError("predictor_core_numerical(): The 'target_variable' must be a string.")
if not isinstance(grouping_variable, str):
    raise TypeError("predictor_core_numerical(): The 'grouping_variable' must be a string.")
if not isinstance(normality_bool, bool):
    raise TypeError("predictor_core_numerical(): The 'normality_bool' must be a boolean.")
if not isinstance(equal_variances_bool, bool):
    raise TypeError("predictor_core_numerical(): The 'equal_variances_bool' must be a boolean.")

Value Validations

if df.empty:
    raise ValueError("predictor_core_n(): The input DataFrame is empty.")
if target_variable not in df.columns:
    raise ValueError(f"predictor_core_n(): The target variable '{target_variable}' was not found in the DataFrame.")
if grouping_variable not in df.columns:
    raise ValueError(f"predictor_core_n(): The grouping variable '{grouping_variable}' was not found in the DataFrame.")
target_variable_is_numerical = evaluate_dtype(df, [target_variable], output='list_n')[0]
if not target_variable_is_numerical:
    raise ValueError(f"predictor_core_n(): The target variable '{target_variable}' must be a numerical variable.")
grouping_variable_is_categorical = evaluate_dtype(df, [grouping_variable], output='list_c')[0]
if not grouping_variable_is_categorical:
    raise ValueError(f"predictor_core_n(): The grouping variable '{grouping_variable}' must be a categorical variable.")

Full code can be found here