Error Handling in model_recommendation_core_inference()
Type Validations
Data Structure Validation: Ensures that df is a pandas DataFrame, which is fundamental for performing any operations using pandas functions.
Formula String Validation: Checks that formula is a string, crucial for using it in statistical model specifications.
List and Dictionary Validations: Ensures that priority_models is a list of strings, and model_kwargs is a dictionary, essential for filtering models and applying specific configurations.
if not isinstance(df, pd.DataFrame):
raise TypeError("model_recommendation_core_inference(): 'df' must be a pandas DataFrame.")
if not isinstance(formula, str):
raise TypeError("model_recommendation_core_inference(): 'formula' must be a string.")
if priority_models is not None and not isinstance(priority_models, list):
raise TypeError("model_recommendation_core_inference(): 'priority_models' must be a list of strings.")
if model_kwargs is not None and not isinstance(model_kwargs, dict):
raise TypeError("model_recommendation_core_inference(): 'model_kwargs' must be a dictionary.")
Value Validations
DataFrame Content Check: Confirms that the DataFrame is not empty, a prerequisite for any data processing.
Formula Structure Check: Validates that formula includes exactly one '~', which is essential for separating dependent and independent variables in statistical modeling.
Variable Presence Check: Ensures all variables specified in formula are present in df, fundamental for model fitting.
if df.empty:
raise ValueError("model_recommendation_core_inference(): The input DataFrame is empty.")
if formula.count('~') != 1:
raise ValueError("model_recommendation_core_inference(): 'formula' must include exactly one '~' to separate dependent and independent variables.")
Additional Verifications
Variable Validation: Checks that the target and independent variables specified in the formula are actually columns in the DataFrame.
n_top_models Validation: Ensures that n_top_models is a positive integer, necessary for determining the number of top models to return.
y_col = formula.split('~')[0].strip()
if y_col not in df.columns:
raise ValueError(f"model_recommendation_core_inference(): Specified target variable '{y_col}' is not in DataFrame.")
independent_vars = formula.split('~')[1]
missing_vars = [var.strip() for var in independent_vars.replace('+', ' ').split() if var.strip() not in df.columns]
if missing_vars:
raise ValueError(f"model_recommendation_core_inference(): The following independent variables are not in DataFrame: {', '.join(missing_vars)}.")
if not isinstance(n_top_models, int) or n_top_models < 1:
raise ValueError("model_recommendation_core_inference(): 'n_top_models' must be an integer greater than 0.")
Error Handling in
model_recommendation_core_inference()
Type Validations
df
is a pandas DataFrame, which is fundamental for performing any operations using pandas functions.formula
is a string, crucial for using it in statistical model specifications.priority_models
is a list of strings, andmodel_kwargs
is a dictionary, essential for filtering models and applying specific configurations.Value Validations
formula
includes exactly one '~', which is essential for separating dependent and independent variables in statistical modeling.formula
are present indf
, fundamental for model fitting.Additional Verifications
n_top_models
is a positive integer, necessary for determining the number of top models to return.