Implement error handling for model_recommendation_core()

Error Handling in `model_recommendation_core()`

Type Validations

Type Check for Training Data (x_train) and Target Data (y_train)
- Ensures that x_train is either a pandas DataFrame or a NumPy ndarray, and y_train is either a pandas Series or a NumPy ndarray. This check is important to ensure the input data is in a format that the function can work with.

if not isinstance(x_train, (pd.DataFrame, np.ndarray)):
    raise TypeError("model_recommendation_core(): 'x_train' must be a pandas DataFrame or NumPy ndarray.")
if not isinstance(y_train, (pd.Series, np.ndarray)):
    raise TypeError("model_recommendation_core(): 'y_train' must be a pandas Series or NumPy ndarray.")

String Type Check for Task Type
- Validates that task_type is a string and it matches either 'classification' or 'regression'. This parameter is crucial for determining the type of models and metrics to use.

if not isinstance(task_type, str) or task_type not in ['classification', 'regression']:
    raise ValueError("model_recommendation_core(): 'task_type' must be either 'classification' or 'regression'.")

Type Checks for Additional Parameters
- Confirms that priority_metrics is a list, cv is an integer, and verbose is an integer. These checks ensure that the function parameters are set correctly for the intended operations.

if not isinstance(priority_metrics, list):
    raise TypeError("model_recommendation_core(): 'priority_metrics' must be a list of scoring metric names.")
if not isinstance(cv, int):
    raise TypeError("model_recommendation_core(): 'cv' must be an integer.")
if not isinstance(n_top_models, int) or n_top_models <= 0:
    raise ValueError("model_recommendation_core(): 'n_top_models' must be an integer greater than 0.")
if not isinstance(verbose, int):
    raise TypeError("model_recommendation_core(): 'verbose' must be an integer value.")

Value Validations

Empty Data Checks
- Checks if x_train and y_train are not empty. It's crucial to have data to train and evaluate models.

if x_train.size == 0:
    raise ValueError("model_recommendation_core(): 'x_train' cannot be empty.")
if y_train.size == 0:
    raise ValueError("model_recommendation_core(): 'y_train' cannot be empty.")

Matching Data Sizes
- Ensures x_train and y_train have the same number of rows. This is essential for training machine learning models, as each feature set (row in x_train) must correspond to a target output (entry in y_train).

if x_train.shape[0] != y_train.shape[0]:
    raise ValueError("model_recommendation_core(): 'x_train' and 'y_train' must have the same number of rows.")

Validity of Priority Metrics
- Checks for duplicates in priority_metrics and ensures all entries are strings representing valid metric names. This is vital for accurately evaluating model performance based on specified metrics.

if len(priority_metrics) != len(set(priority_metrics)):
    raise ValueError("model_recommendation_core(): 'priority_metrics' should not contain duplicate values.")
if not all(isinstance(metric, str) for metric in priority_metrics):
    raise ValueError("model_recommendation_core(): All items in 'priority_metrics' must be strings representing metric names.")

Consistency of Metrics with Task Type
- Validates that the metric names provided in priority_metrics are appropriate for the specified task_type (classification or regression). This ensures that the evaluation is performed using relevant and supported metrics.

# Assuming the definition of valid metrics for classification and regression
valid_metrics = set(scoring_classification.values()) | set(scoring_regression.values())
invalid_metrics = [metric for metric in priority_metrics if metric not in valid_metrics]
if invalid_metrics:
    valid_metric_list = ", ".join(sorted(valid_metrics))
    raise ValueError(f"model_recommendation_core(): Invalid metric(s) in 'priority_metrics': {', '.join(invalid_metrics)}.\n\nValid metrics are: {valid_metric_list}.")

Handling Excessive Number of Top Models
- Ensures that n_top_models does not exceed the number of available models for the given task type. This prevents errors
in scenarios where there are fewer models than requested top models.

if task_type == 'classification' and n_top_models > len(models_classification):
    raise ValueError(f"model_recommendation_core(): 'n_top_models' cannot exceed the number of available classification models ({len(models_classification)}).")
if task_type == 'regression' and n_top_models > len(models_regression):
    raise ValueError(f"model_recommendation_core(): 'n_top_models' cannot exceed the number of available regression models ({len(models_regression)}).")

ETA444 / datasafari

Implement error handling for model_recommendation_core() #109

Error Handling in `model_recommendation_core()`

Type Validations

Value Validations

ETA444 / datasafari

Implement error handling for model_recommendation_core() #109

Error Handling in model_recommendation_core()

Type Validations

Value Validations

Error Handling in `model_recommendation_core()`