Implement error handling for explore_num()

Implementation Summary:

The error handling within explore_num() ensures that the function operates correctly with valid inputs, enhancing its robustness by catching common issues through type and value checks.

Purpose:

The purpose of this error handling is to validate the inputs to the explore_num() function, ensuring they conform to expected types and values, thereby preventing runtime errors and improving user experience.

Code Breakdown:

Check df Type:

Purpose: To ensure df is a pandas DataFrame.

if not isinstance(df, pd.DataFrame):
   raise TypeError("explore_num(): The df parameter must be a pandas DataFrame.")

Check numerical_variables Type:

Purpose: To ensure numerical_variables is a list of strings representing column names.

if not isinstance(numerical_variables, list):
   raise TypeError("explore_num(): The numerical_variables parameter must be a list of strings.")
else:
   if not all(isinstance(var, str) for var in numerical_variables):
       raise TypeError("explore_num(): All items in the numerical_variables list must be strings representing column names.")

Check method Type:

Purpose: To ensure method is a string.

if not isinstance(method, str):
   raise TypeError("explore_num(): The method parameter must be a string.")

Check output Type:

Purpose: To ensure output is a string.

if not isinstance(output, str):
   raise TypeError("explore_num(): The output parameter must be a string.")

Check threshold_z Type:

Purpose: To ensure threshold_z is a float or an int.

if not isinstance(threshold_z, (float, int)):
   raise TypeError("explore_num(): The value of threshold_z must be a float or int.")

Check Empty DataFrame:

Purpose: To ensure the DataFrame df is not empty.

if df.empty:
   raise ValueError("explore_num(): The input DataFrame is empty.")

Check Valid method:

Purpose: To ensure method is one of the specified valid methods.

valid_methods = ['correlation_analysis', 'distribution_analysis', 'outliers_zscore', 'outliers_iqr', 'outliers_mahalanobis', 'multicollinearity', 'all']
if method.lower() not in valid_methods:
   raise ValueError(f"explore_num(): Invalid method '{method}'. Valid options are: {', '.join(valid_methods)}")

Check Valid output:

Purpose: To ensure output is either 'print' or 'return'.

if output.lower() not in ['print', 'return']:
   raise ValueError("explore_num(): Invalid output method. Choose 'print' or 'return'.")

Check Non-empty numerical_variables:

Purpose: To ensure numerical_variables list is not empty.

if len(numerical_variables) == 0:
   raise ValueError("explore_num(): The 'numerical_variables' list must contain at least one column name.")

Check Numerical Variables:

Purpose: To ensure the variables in numerical_variables are numerical.

numerical_types = evaluate_dtype(df, numerical_variables, output='list_n')
if not all(numerical_types):
    raise ValueError("explore_num(): The 'numerical_variables' list must contain only names of numerical variables.")

Check Variables Exist:

Purpose: To ensure all specified variables in numerical_variables are found in the DataFrame's columns.

missing_vars = [var for var in numerical_variables if var not in df.columns]
if missing_vars:
    raise ValueError(f"explore_num(): The following variables were not found in the DataFrame: {', '.join(missing_vars)}")

ETA444 / datasafari

Implement error handling for explore_num() #46