Implement new evaluate_variance() method: 'bartlett'

ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.

GNU General Public License v3.0

2 stars 0 forks source link

import pandas as pd import numpy as np # Load example dataset data = { 'Group': np.random.choice(['A', 'B', 'C'], 100), 'Data': np.random.normal(0, 1, 100) } df = pd.DataFrame(data) # Evaluate variance homogeneity using Bartlett's test with normality assumption variance_info_bartlett = evaluate_variance(df, 'Data', 'Group', method='bartlett', normality_info=True)

Implementation Summary:

The 'bartlett' method assesses the homogeneity of variances across groups using Bartlett's test. This method is most effective when the data follows a normal distribution. It's particularly sensitive to deviations from normality, making it suitable for situations where normality can be reasonably assumed.

Code Breakdown:

Method Header:

Purpose: Introduce the start of the 'bartlett' method implementation, along with relevant conditions.

if method in ['bartlett', 'consensus'] and normality_info:
   bartlett_stat, bartlett_pval = bartlett(*samples).statistic, bartlett(*samples).pvalue
   variance_info['bartlett'] = bartlett_pval > 0.05

Save Results:

Purpose: Store the results of Bartlett's test in a dictionary for further use.

# save the info in return object
output_info['bartlett'] = {'stat': bartlett_stat, 'p': bartlett_pval, 'equal_variances': variance_info['bartlett']}

Construct Console Output:

Purpose: Display the results of the test and provide insights.

# construct console output
bartlett_text = f"Results for samples in groups of '{grouping_variable}' for ['{target_variable}'] target variable:\n  ➡ statistic: {bartlett_stat}\n  ➡ p-value: {bartlett_pval}\n{(f'  ∴ Equal variances: Yes (H0 cannot be rejected)' if variance_info['bartlett'] else f'  ∴ Equal variances: No (H0 rejected)')}\n\n"
bartlett_title = f"< EQUAL VARIANCES TESTING: BARTLETT >\n\n"
bartlett_tip = "☻ Tip: The Bartlett test is sensitive to departures from normality, making it most suitable when normality can be reasonably assumed. It provides a useful measure for evaluating variance homogeneity under these conditions.\n"

Output and Return:

Purpose: Display the relevant results and decide the output based on the pipeline flag.

# output & return
if method == 'bartlett':
   print(bartlett_title, bartlett_text, bartlett_tip)
   return output_info if not pipeline else variance_info['bartlett']

Link to Full Code: evaluate_variance.py

ETA444 / datasafari

Implement new evaluate_variance() method: 'bartlett' #68