Closed ETA444 closed 6 months ago
The function predict_hypothesis()
automates the selection and execution of hypothesis tests based on the characteristics of two input variables within a DataFrame. It intelligently determines whether categorical or numerical hypothesis tests are appropriate, assesses necessary assumptions, and conducts the tests, providing detailed outcomes.
df
is a pandas DataFrame, which is essential for data manipulation and access throughout the function.if not isinstance(df, pd.DataFrame):
raise TypeError("predict_hypothesis(): The 'df' parameter must be a pandas DataFrame.")
var1
and var2
are strings, as they are expected to reference column names in the DataFrame.if not isinstance(var1, str) or not isinstance(var2, str):
raise TypeError("predict_hypothesis(): The 'var1' and 'var2' parameters must be strings.")
normality_method
, variance_method
, and exact_tests_alternative
are strings. These parameters dictate the methodology for evaluating assumptions and the direction of hypothesis tests.if not isinstance(normality_method, str):
raise TypeError("predict_hypothesis(): The 'normality_method' parameter must be a string.")
if not isinstance(variance_method, str):
raise TypeError("predict_hypothesis(): The 'variance_method' parameter must be a string.")
if not isinstance(exact_tests_alternative, str):
raise TypeError("predict_hypothesis(): The 'exact_tests_alternative' parameter must be a string.")
yates_min_sample_size
is an integer, crucial for determining the application of Yates' correction.if not isinstance(yates_min_sample_size, int):
raise TypeError("predict_hypothesis(): The 'yates_min_sample_size' parameter must be an integer.")
if df.empty:
raise ValueError("model_recommendation_core_inference(): The input DataFrame is empty.")
normality_method
, variance_method
, and exact_tests_alternative
are within their respective valid options. This step is crucial for directing the function to use appropriate evaluation methods and hypothesis test configurations.valid_normality_methods = ['shapiro', 'anderson', 'normaltest', 'lilliefors', 'consensus']
if normality_method.lower() not in valid_normality_methods:
raise ValueError(f"predict_hypothesis(): Invalid 'normality_method' value. Expected one of {valid_normality_methods}, got '{normality_method}'.")
valid_variance_methods = ['levene', 'bartlett', 'fligner', 'consensus']
if variance_method.lower() not in valid_variance_methods:
raise ValueError(f"predict_hypothesis(): Invalid 'variance_method' value. Expected one of {valid_variance_methods}, got '{variance_method}'.")
valid_alternatives = ['two-sided', 'less', 'greater']
if exact_tests_alternative.lower() not in valid_alternatives:
raise ValueError(f"predict_hypothesis(): Invalid 'exact_tests_alternative' value. Expected one of {valid_alternatives}, got '{exact_tests_alternative}'.")
yates_min_sample_size
is greater than zero, a necessary condition for the application of Yates' correction.if yates_min_sample_size < 1:
raise ValueError("predict_hypothesis(): The 'yates_min_sample_size' must be at least 1.")
Implement error handling for each user input of the function.