ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Write NumPy docstring for model_recommendation_core_inference() #116

Closed ETA444 closed 5 months ago

ETA444 commented 5 months ago

Written and accessible:

help(model_recommendation_core_inference)

This function generates a NumPy docstring for model_recommendation_core_inference(), providing a comprehensive description of its purpose, parameters, return values, exceptions, examples, and additional notes.

Summary:

The function model_recommendation_core_inference() recommends top statistical models for inference based on user-specified preferences and a formula. It evaluates various statistical models from statsmodels, suitable for either regression or classification tasks determined dynamically by the nature of the target variable.

Docstring Sections Preview:

Description

"""
Recommends top statistical models for inference based on user-specified preferences and formula.
This function evaluates various statistical models from statsmodels, each suitable for either
regression or classification tasks determined dynamically by the nature of the target variable.
"""

Parameters

"""
Parameters
----------
df : pd.DataFrame
    DataFrame containing the data to fit the models.
formula : str
    A patsy formula specifying the model. The target variable is on the left of '~'.
priority_models : List[str], optional
    A list of model names to restrict the evaluation to specific models, otherwise all applicable models are evaluated.
n_top_models : int, optional
    Number of top-performing models to return based on sorted metrics. Defaults to 3.
model_kwargs : dict, optional
    Dictionary mapping model names to dictionaries of additional keyword arguments to pass to the model constructors.
    This can be used to pass additional parameters required by specific models.
verbose : int, optional
    The verbosity level: 0 means silent, 1 outputs summary results, 2 includes detailed model summaries.
"""

Raises

"""
Raises
------
TypeError
    - If 'df' is not a pandas DataFrame, ensuring that the input data structure is correct for model fitting.
    - If 'formula' is not a string, verifying that the model formula is correctly specified as a string.
    - If 'priority_models' is provided and is not a list of strings, ensuring the user specifies a proper list of model names.
    - If 'model_kwargs' is provided and is not a dictionary, ensuring the correct format for passing additional keyword arguments to model constructors.
    - If 'verbose' is not an integer, verifying that the verbosity level is specified as an integer.

ValueError
    - If the input DataFrame is empty, ensuring that there is data available for model fitting.
    - If 'formula' does not contain exactly one '~', which is necessary to separate the dependent and independent variables in the model specification.
    - If the specified target variable from 'formula' is not found in the DataFrame, ensuring the formula correctly references a column in the DataFrame.
    - If any variables specified in the 'formula' for independent variables are not found in the DataFrame, checking for the presence of all required variables in the DataFrame.
    - If 'n_top_models' is not a positive integer, ensuring that the number of models to return is specified correctly.
"""

Returns

"""
Returns
-------
Dict[str, Any]
    A dictionary with model names as keys and dictionaries as values. Each dictionary contains the 'model' object,
    'metrics' dictionary with performance metrics, and potentially 'summary' if verbose > 1.
"""

Examples

"""
Examples
--------
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({
...     'Age': np.random.randint(18, 70, size=100),
...     'Salary': np.random.normal(50000, 15000, size=100),
...     'Experience': np.random.randint(1, 30, size=100)
... })
>>> formula = 'Salary ~ Age + Experience'
>>> best_inference_models = model_recommendation_core_inference(
...     df,
...     formula,
...     verbose=2
... )
>>> # Accessing the best model's object
>>> best_model_name = list(best_inference_models.keys())[0]
>>> best_model = best_inference_models[best_model_name]['model']
>>> # Viewing the summary of the best model
>>> print(best_model.summary())
>>> # Extracting AIC of the best model
>>> best_model_aic = best_inference_models[best_model_name]['metrics']['AIC']
>>> print(f"The best model according to AIC is {best_model_name} with an AIC of {best_model_aic:.2f}")
"""

Notes

"""
Notes
-----
- **Dynamic Model Evaluation**: Depending on the datatype of the target variable specified in the formula,
  the function dynamically decides whether to treat the problem as a regression or classification task,
  using appropriate metrics and models for each.

- **Handling Model Specific Requirements**: This function allows passing custom arguments to model constructors
  to handle models that require specific parameters via `model_kwargs`.

- **Metric Adjustments**: For metrics where a lower value is better (e.g., AIC, BIC), these are adjusted
  to be compared directly alongside higher-is-better metrics like R-squared

, by negating their values during sorting.

- **Verbose Output**: The function provides different levels of output detail which can help in diagnosing model fit
  or understanding model performance.

- **Error Handling**: The function will report and skip models that encounter errors during fitting, allowing for
  robust execution even if some models are not applicable to the provided data or formula.
"""