ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement new evaluate_variance() method: 'bartlett' #68

Closed ETA444 closed 6 months ago

ETA444 commented 7 months ago

Title: Implementing Bartlett's Test for Variance Homogeneity with Normality Assumption

Description: In this update, we introduce Bartlett's test for evaluating the homogeneity of variances across groups defined by a categorical variable in a dataset. Bartlett's test assumes normality of the data and is particularly suitable when dealing with normally distributed variables. This addition enhances the versatility of our variance homogeneity evaluation methods, catering to different data distribution characteristics.

Example Usage:

import pandas as pd
import numpy as np

# Load example dataset
data = {
    'Group': np.random.choice(['A', 'B', 'C'], 100),
    'Data': np.random.normal(0, 1, 100)
}
df = pd.DataFrame(data)

# Evaluate variance homogeneity using Bartlett's test with normality assumption
variance_info_bartlett = evaluate_variance(df, 'Data', 'Group', method='bartlett', normality_info=True)

Expected Outcome: Upon implementing Bartlett's test with the assumption of normality, we expect a robust evaluation of variance homogeneity across different groups in the dataset. This method provides reliable results for datasets with normally distributed variables and enhances the accuracy of variance homogeneity assessments.

Additional Context: The incorporation of Bartlett's test enriches our variance homogeneity evaluation module by providing a method tailored to normally distributed data. Users can now leverage this test to assess variance homogeneity when the normality assumption holds, contributing to a comprehensive analysis toolkit.

ETA444 commented 6 months ago

Implementation Summary:

The 'bartlett' method assesses the homogeneity of variances across groups using Bartlett's test. This method is most effective when the data follows a normal distribution. It's particularly sensitive to deviations from normality, making it suitable for situations where normality can be reasonably assumed.

Code Breakdown:

  1. Method Header:

    • Purpose: Introduce the start of the 'bartlett' method implementation, along with relevant conditions.
    if method in ['bartlett', 'consensus'] and normality_info:
       bartlett_stat, bartlett_pval = bartlett(*samples).statistic, bartlett(*samples).pvalue
       variance_info['bartlett'] = bartlett_pval > 0.05
  2. Save Results:

    • Purpose: Store the results of Bartlett's test in a dictionary for further use.
    # save the info in return object
    output_info['bartlett'] = {'stat': bartlett_stat, 'p': bartlett_pval, 'equal_variances': variance_info['bartlett']}
  3. Construct Console Output:

    • Purpose: Display the results of the test and provide insights.
    # construct console output
    bartlett_text = f"Results for samples in groups of '{grouping_variable}' for ['{target_variable}'] target variable:\n  ➡ statistic: {bartlett_stat}\n  ➡ p-value: {bartlett_pval}\n{(f'  ∴ Equal variances: Yes (H0 cannot be rejected)' if variance_info['bartlett'] else f'  ∴ Equal variances: No (H0 rejected)')}\n\n"
    bartlett_title = f"< EQUAL VARIANCES TESTING: BARTLETT >\n\n"
    bartlett_tip = "☻ Tip: The Bartlett test is sensitive to departures from normality, making it most suitable when normality can be reasonably assumed. It provides a useful measure for evaluating variance homogeneity under these conditions.\n"
  4. Output and Return:

    • Purpose: Display the relevant results and decide the output based on the pipeline flag.
    # output & return
    if method == 'bartlett':
       print(bartlett_title, bartlett_text, bartlett_tip)
       return output_info if not pipeline else variance_info['bartlett']

Link to Full Code: evaluate_variance.py