Implement new evaluate_variance() method: 'fligner'

Title: Implementing Fligner-Killeen Test for Variance Homogeneity

Description: In this task, we aim to implement the Fligner-Killeen test for evaluating the homogeneity of variances in the numerical/target variable across groups defined by a grouping/categorical variable in a dataset. This addition enhances the robustness of our variance homogeneity evaluation methods, particularly for datasets with non-normally distributed variables.

Example Usage:

import pandas as pd
import numpy as np

# Load example dataset
data = {
    'Group': np.random.choice(['A', 'B', 'C'], 100),
    'Data': np.random.normal(0, 1, 100)
}
df = pd.DataFrame(data)

# Evaluate variance homogeneity using Fligner-Killeen test
variance_info_fligner = evaluate_variance(df, 'Data', 'Group', method='fligner')

# Evaluate variance homogeneity using consensus method
variance_info_consensus = evaluate_variance(df, 'Data', 'Group', method='consensus')

Expected Outcome: Upon implementing the Fligner-Killeen test, we anticipate a robust evaluation of variance homogeneity across different groups in the dataset. This method offers resilience against departures from normality and provides reliable results for datasets with non-normally distributed variables.

Additional Context: The incorporation of the Fligner-Killeen test enriches our variance homogeneity evaluation module by offering an alternative non-parametric approach. This enhances the versatility of our toolkit, catering to various data distribution patterns and analysis requirements. Furthermore, detailed console output and tips enhance user understanding and usability.

Implementation Summary:

The 'fligner' method is implemented to evaluate the homogeneity of variances across groups using the Fligner-Killeen test. This method is non-parametric and less sensitive to departures from normality compared to the Bartlett test. It is suitable for data that might not be normally distributed, offering robustness similar to the Levene test but through a different statistical approach.

Code Breakdown:

Method Header:

Purpose: Clearly indicate the start of the 'fligner' method implementation and provide context for the chosen test.

if method in ['fligner', 'consensus']:
   fligner_stat, fligner_pval = fligner(*samples).statistic, fligner(*samples).pvalue
   variance_info['fligner'] = fligner_pval > 0.05

Save Results:

Purpose: Store the results of the Fligner-Killeen test in a dictionary for further use.

# save the info in return object
output_info['fligner'] = {'stat': fligner_stat, 'p': fligner_pval, 'equal_variances': variance_info['fligner']}

Construct Console Output:

Purpose: Display the results of the test and provide insights.

# construct console output
fligner_text = f"Results for samples in groups of '{grouping_variable}' for ['{target_variable}'] target variable:\n  ➡ statistic: {fligner_stat}\n  ➡ p-value: {fligner_pval}\n{(f'  ∴ Equal variances: Yes (H0 cannot be rejected)' if variance_info['fligner'] else f'  ∴ Equal variances: No (H0 rejected)')}\n\n"
fligner_title = f"< EQUAL VARIANCES TESTING: FLIGNER-KILLEEN >\n\n"
fligner_tip = "☻ Tip: The Fligner-Killeen test is a non-parametric alternative that is less sensitive to departures from normality compared to the Bartlett test. It's a good choice when dealing with data that might not be normally distributed, offering robustness similar to the Levene test but through a different statistical approach.\n"

Output and Return:

Purpose: Display the relevant results and decide the output based on the pipeline flag.

# output & return
if method == 'fligner':
   print(fligner_title, fligner_text, fligner_tip)
   return output_info if not pipeline else variance_info['fligner']

Link to Full Code: evaluate_variance.py

ETA444 / datasafari

Implement new evaluate_variance() method: 'fligner' #67