This solution addresses the issue "Write NumPy docstring of model_recommendation_core()" by providing a detailed NumPy-style docstring for the model_recommendation_core() function.
Summary:
The function model_recommendation_core() recommends the top N machine learning models based on composite scores derived from multiple evaluation metrics. It forms part of a broader machine learning pipeline, assisting in model selection by automatically evaluating models against a set of performance metrics. The docstring follows the NumPy format and includes details on the parameters, return values, exceptions, and examples.
Docstring Sections Preview:
Description
"""
Recommends top N machine learning models based on composite scores derived from multiple evaluation metrics.
This function is part of a broader machine learning pipeline, designed to facilitate model selection by automatically evaluating a range of models against a set of performance metrics, tailored to the specific needs of the analysis.
"""
Parameters
"""
Parameters
----------
x_train : Union[pd.DataFrame, np.ndarray]
Training feature dataset.
y_train : Union[pd.Series, np.ndarray]
Training target variable.
task_type : str
Specifies the type of machine learning task: 'classification' or 'regression'.
priority_metrics : List[str], optional
List of metric names given priority in model scoring. Default is an empty list.
cv: int, optional
Determines the cross-validation splitting strategy. Default is 5, to use the default 5-fold cross validation.
n_top_models : int, optional
Number of top models to recommend. Default is 3.
verbose : int, optional
The higher value the more output and information the user receives. Default is 1.
"""
Returns
"""
Returns
-------
Dict[str, Any]
Dictionary of top N recommended models, keyed by model name with model object as value.
"""
Raises
"""
Raises
------
TypeError
- If 'x_train' is not a pandas DataFrame or NumPy ndarray.
- If 'y_train' is not a pandas Series or NumPy ndarray.
- If 'priority_metrics' is not a list.
- If 'verbose' is not an integer.
ValueError
- If 'task_type' is not 'classification' or 'regression'.
- If 'n_top_models' is not an integer greater than 0.
- If 'x_train' and 'y_train' do not have the same number of rows.
- If 'x_train' or 'y_train' is empty.
- If 'priority_metrics' contains duplicate values or items not representing metric names as strings.
- If provided metric names in 'priority_metrics' are invalid or not supported, listing valid metric names for reference.
- If provided metric names in 'priority_metrics' are not suitable for the 'task_type', listing valid metrics names for reference.
- If 'n_top_models' exceeds the number of available models for the specified 'task_type'.
"""
Examples
"""
Examples
--------
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> X, y = load_iris(return_X_y=True)
>>> x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> recommended_models = model_recommendation_core(x_train, y_train, task_type='classification', priority_metrics=['Accuracy'], n_top_models=2)
>>> print(list(recommended_models.keys()))
"""
Notes
"""
Notes
-----
The core leverages a composite score for model evaluation, which synthesizes scores across multiple metrics, weighted by the specified priorities. This method enables a holistic and nuanced model comparison, taking into account the multidimensional aspects of model performance.
- Priority Metrics: Assigning weights (default: 5 for prioritized metrics, 1 for others) allows users to emphasize metrics they find most relevant, affecting the composite score calculation.
- Composite Score: Calculated as a weighted average of metric scores, normalized by the total weight. This score serves as a basis for ranking models.
- Tips and Guidance: Optional tips provide insights on interpreting and leveraging different metrics, enhancing informed decision-making in model selection.
- Ensuring 'Higher is Better' Across All Metrics: For metrics where traditionally a lower score is better (e.g., RMSE), scores are transformed to align with the 'higher is better' principle used in composite score calculation. This transformation is inherent to the scoring configurations and does not require manual adjustment.
"""
Written and accessible:
This solution addresses the issue "Write NumPy docstring of model_recommendation_core()" by providing a detailed NumPy-style docstring for the
model_recommendation_core()
function.Summary:
The function
model_recommendation_core()
recommends the top N machine learning models based on composite scores derived from multiple evaluation metrics. It forms part of a broader machine learning pipeline, assisting in model selection by automatically evaluating models against a set of performance metrics. The docstring follows the NumPy format and includes details on the parameters, return values, exceptions, and examples.Docstring Sections Preview:
Description
Parameters
Returns
Raises
Examples
Notes