ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement error handling for explore_num() #46

Closed ETA444 closed 6 months ago

ETA444 commented 9 months ago

Implement error handling for each user input of the function.

ETA444 commented 6 months ago

Implementation Summary:

The error handling within explore_num() ensures that the function operates correctly with valid inputs, enhancing its robustness by catching common issues through type and value checks.

Purpose:

The purpose of this error handling is to validate the inputs to the explore_num() function, ensuring they conform to expected types and values, thereby preventing runtime errors and improving user experience.

Code Breakdown:

  1. Check df Type:

    • Purpose: To ensure df is a pandas DataFrame.
    if not isinstance(df, pd.DataFrame):
       raise TypeError("explore_num(): The df parameter must be a pandas DataFrame.")
  2. Check numerical_variables Type:

    • Purpose: To ensure numerical_variables is a list of strings representing column names.
    if not isinstance(numerical_variables, list):
       raise TypeError("explore_num(): The numerical_variables parameter must be a list of strings.")
    else:
       if not all(isinstance(var, str) for var in numerical_variables):
           raise TypeError("explore_num(): All items in the numerical_variables list must be strings representing column names.")
  3. Check method Type:

    • Purpose: To ensure method is a string.
    if not isinstance(method, str):
       raise TypeError("explore_num(): The method parameter must be a string.")
  4. Check output Type:

    • Purpose: To ensure output is a string.
    if not isinstance(output, str):
       raise TypeError("explore_num(): The output parameter must be a string.")
  5. Check threshold_z Type:

    • Purpose: To ensure threshold_z is a float or an int.
    if not isinstance(threshold_z, (float, int)):
       raise TypeError("explore_num(): The value of threshold_z must be a float or int.")
  6. Check Empty DataFrame:

    • Purpose: To ensure the DataFrame df is not empty.
    if df.empty:
       raise ValueError("explore_num(): The input DataFrame is empty.")
  7. Check Valid method:

    • Purpose: To ensure method is one of the specified valid methods.
    valid_methods = ['correlation_analysis', 'distribution_analysis', 'outliers_zscore', 'outliers_iqr', 'outliers_mahalanobis', 'multicollinearity', 'all']
    if method.lower() not in valid_methods:
       raise ValueError(f"explore_num(): Invalid method '{method}'. Valid options are: {', '.join(valid_methods)}")
  8. Check Valid output:

    • Purpose: To ensure output is either 'print' or 'return'.
    if output.lower() not in ['print', 'return']:
       raise ValueError("explore_num(): Invalid output method. Choose 'print' or 'return'.")
  9. Check Non-empty numerical_variables:

    • Purpose: To ensure numerical_variables list is not empty.
    if len(numerical_variables) == 0:
       raise ValueError("explore_num(): The 'numerical_variables' list must contain at least one column name.")
  10. Check Numerical Variables:

    • Purpose: To ensure the variables in numerical_variables are numerical.
    numerical_types = evaluate_dtype(df, numerical_variables, output='list_n')
    if not all(numerical_types):
        raise ValueError("explore_num(): The 'numerical_variables' list must contain only names of numerical variables.")
  11. Check Variables Exist:

    • Purpose: To ensure all specified variables in numerical_variables are found in the DataFrame's columns.
    missing_vars = [var for var in numerical_variables if var not in df.columns]
    if missing_vars:
        raise ValueError(f"explore_num(): The following variables were not found in the DataFrame: {', '.join(missing_vars)}")