ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement new evaluate_variance() method: 'fligner' #67

Closed ETA444 closed 6 months ago

ETA444 commented 7 months ago

Title: Implementing Fligner-Killeen Test for Variance Homogeneity

Description: In this task, we aim to implement the Fligner-Killeen test for evaluating the homogeneity of variances in the numerical/target variable across groups defined by a grouping/categorical variable in a dataset. This addition enhances the robustness of our variance homogeneity evaluation methods, particularly for datasets with non-normally distributed variables.

Example Usage:

import pandas as pd
import numpy as np

# Load example dataset
data = {
    'Group': np.random.choice(['A', 'B', 'C'], 100),
    'Data': np.random.normal(0, 1, 100)
}
df = pd.DataFrame(data)

# Evaluate variance homogeneity using Fligner-Killeen test
variance_info_fligner = evaluate_variance(df, 'Data', 'Group', method='fligner')

# Evaluate variance homogeneity using consensus method
variance_info_consensus = evaluate_variance(df, 'Data', 'Group', method='consensus')

Expected Outcome: Upon implementing the Fligner-Killeen test, we anticipate a robust evaluation of variance homogeneity across different groups in the dataset. This method offers resilience against departures from normality and provides reliable results for datasets with non-normally distributed variables.

Additional Context: The incorporation of the Fligner-Killeen test enriches our variance homogeneity evaluation module by offering an alternative non-parametric approach. This enhances the versatility of our toolkit, catering to various data distribution patterns and analysis requirements. Furthermore, detailed console output and tips enhance user understanding and usability.

ETA444 commented 6 months ago

Implementation Summary:

The 'fligner' method is implemented to evaluate the homogeneity of variances across groups using the Fligner-Killeen test. This method is non-parametric and less sensitive to departures from normality compared to the Bartlett test. It is suitable for data that might not be normally distributed, offering robustness similar to the Levene test but through a different statistical approach.

Code Breakdown:

  1. Method Header:

    • Purpose: Clearly indicate the start of the 'fligner' method implementation and provide context for the chosen test.
    if method in ['fligner', 'consensus']:
       fligner_stat, fligner_pval = fligner(*samples).statistic, fligner(*samples).pvalue
       variance_info['fligner'] = fligner_pval > 0.05
  2. Save Results:

    • Purpose: Store the results of the Fligner-Killeen test in a dictionary for further use.
    # save the info in return object
    output_info['fligner'] = {'stat': fligner_stat, 'p': fligner_pval, 'equal_variances': variance_info['fligner']}
  3. Construct Console Output:

    • Purpose: Display the results of the test and provide insights.
    # construct console output
    fligner_text = f"Results for samples in groups of '{grouping_variable}' for ['{target_variable}'] target variable:\n  ➡ statistic: {fligner_stat}\n  ➡ p-value: {fligner_pval}\n{(f'  ∴ Equal variances: Yes (H0 cannot be rejected)' if variance_info['fligner'] else f'  ∴ Equal variances: No (H0 rejected)')}\n\n"
    fligner_title = f"< EQUAL VARIANCES TESTING: FLIGNER-KILLEEN >\n\n"
    fligner_tip = "☻ Tip: The Fligner-Killeen test is a non-parametric alternative that is less sensitive to departures from normality compared to the Bartlett test. It's a good choice when dealing with data that might not be normally distributed, offering robustness similar to the Levene test but through a different statistical approach.\n"
  4. Output and Return:

    • Purpose: Display the relevant results and decide the output based on the pipeline flag.
    # output & return
    if method == 'fligner':
       print(fligner_title, fligner_text, fligner_tip)
       return output_info if not pipeline else variance_info['fligner']

Link to Full Code: evaluate_variance.py