Closed ETA444 closed 6 months ago
Implementation Summary:
The error handling within explore_num()
ensures that the function operates correctly with valid inputs, enhancing its robustness by catching common issues through type and value checks.
Purpose:
The purpose of this error handling is to validate the inputs to the explore_num()
function, ensuring they conform to expected types and values, thereby preventing runtime errors and improving user experience.
Code Breakdown:
Check df
Type:
df
is a pandas DataFrame.if not isinstance(df, pd.DataFrame):
raise TypeError("explore_num(): The df parameter must be a pandas DataFrame.")
Check numerical_variables
Type:
numerical_variables
is a list of strings representing column names.if not isinstance(numerical_variables, list):
raise TypeError("explore_num(): The numerical_variables parameter must be a list of strings.")
else:
if not all(isinstance(var, str) for var in numerical_variables):
raise TypeError("explore_num(): All items in the numerical_variables list must be strings representing column names.")
Check method
Type:
method
is a string.if not isinstance(method, str):
raise TypeError("explore_num(): The method parameter must be a string.")
Check output
Type:
output
is a string.if not isinstance(output, str):
raise TypeError("explore_num(): The output parameter must be a string.")
Check threshold_z
Type:
threshold_z
is a float or an int.if not isinstance(threshold_z, (float, int)):
raise TypeError("explore_num(): The value of threshold_z must be a float or int.")
Check Empty DataFrame:
df
is not empty.if df.empty:
raise ValueError("explore_num(): The input DataFrame is empty.")
Check Valid method
:
method
is one of the specified valid methods.valid_methods = ['correlation_analysis', 'distribution_analysis', 'outliers_zscore', 'outliers_iqr', 'outliers_mahalanobis', 'multicollinearity', 'all']
if method.lower() not in valid_methods:
raise ValueError(f"explore_num(): Invalid method '{method}'. Valid options are: {', '.join(valid_methods)}")
Check Valid output
:
output
is either 'print' or 'return'.if output.lower() not in ['print', 'return']:
raise ValueError("explore_num(): Invalid output method. Choose 'print' or 'return'.")
Check Non-empty numerical_variables
:
numerical_variables
list is not empty.if len(numerical_variables) == 0:
raise ValueError("explore_num(): The 'numerical_variables' list must contain at least one column name.")
Check Numerical Variables:
numerical_variables
are numerical.numerical_types = evaluate_dtype(df, numerical_variables, output='list_n')
if not all(numerical_types):
raise ValueError("explore_num(): The 'numerical_variables' list must contain only names of numerical variables.")
Check Variables Exist:
numerical_variables
are found in the DataFrame's columns.missing_vars = [var for var in numerical_variables if var not in df.columns]
if missing_vars:
raise ValueError(f"explore_num(): The following variables were not found in the DataFrame: {', '.join(missing_vars)}")
Implement error handling for each user input of the function.