ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement error handling for model_recommendation_core_inference() #118

Closed ETA444 closed 5 months ago

ETA444 commented 5 months ago

Error Handling in model_recommendation_core_inference()

Type Validations

if not isinstance(df, pd.DataFrame):
    raise TypeError("model_recommendation_core_inference(): 'df' must be a pandas DataFrame.")
if not isinstance(formula, str):
    raise TypeError("model_recommendation_core_inference(): 'formula' must be a string.")
if priority_models is not None and not isinstance(priority_models, list):
    raise TypeError("model_recommendation_core_inference(): 'priority_models' must be a list of strings.")
if model_kwargs is not None and not isinstance(model_kwargs, dict):
    raise TypeError("model_recommendation_core_inference(): 'model_kwargs' must be a dictionary.")

Value Validations

if df.empty:
    raise ValueError("model_recommendation_core_inference(): The input DataFrame is empty.")
if formula.count('~') != 1:
    raise ValueError("model_recommendation_core_inference(): 'formula' must include exactly one '~' to separate dependent and independent variables.")

Additional Verifications

y_col = formula.split('~')[0].strip()
if y_col not in df.columns:
    raise ValueError(f"model_recommendation_core_inference(): Specified target variable '{y_col}' is not in DataFrame.")
independent_vars = formula.split('~')[1]
missing_vars = [var.strip() for var in independent_vars.replace('+', ' ').split() if var.strip() not in df.columns]
if missing_vars:
    raise ValueError(f"model_recommendation_core_inference(): The following independent variables are not in DataFrame: {', '.join(missing_vars)}.")
if not isinstance(n_top_models, int) or n_top_models < 1:
    raise ValueError("model_recommendation_core_inference(): 'n_top_models' must be an integer greater than 0.")