ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement new evaluate_variance() method: 'consensus' #69

Closed ETA444 closed 6 months ago

ETA444 commented 7 months ago

Title: Implementing Consensus Method for Variance Homogeneity Testing

Description: This update introduces a consensus method for evaluating variance homogeneity across groups in a dataset. The consensus method aggregates results from multiple variance tests, including Levene's, Fligner-Killeen's, and Bartlett's tests, to reach a robust conclusion regarding the equality of variances. This approach enhances the reliability of variance homogeneity assessments, especially in scenarios where individual tests may produce conflicting results.

Example Usage:

import pandas as pd
import numpy as np

# Load example dataset
data = {
    'Group': np.random.choice(['A', 'B', 'C'], 100),
    'Data': np.random.normal(0, 1, 100)
}
df = pd.DataFrame(data)

# Evaluate variance homogeneity using the consensus method
variance_homogeneity = evaluate_variance(df, 'Data', 'Group', method='consensus')

Expected Outcome: By leveraging the consensus method, users can obtain a robust assessment of variance homogeneity across different groups in the dataset. This approach considers the collective results of multiple variance tests, providing a more comprehensive and reliable determination of variance equality.

Additional Context: The introduction of the consensus method enhances the variance homogeneity evaluation module by offering a consolidated approach to interpreting variance test results. This method ensures greater confidence in the assessment of variance homogeneity, facilitating more informed decision-making in statistical analyses and hypothesis testing.

ETA444 commented 6 months ago

Implementation Summary:

The 'consensus' method evaluates variance homogeneity by combining results from Levene, Bartlett, and Fligner-Killeen tests. It utilizes a majority rule approach to conclude if the variances are homogeneous or not. This method provides a robust determination of homogeneity, especially useful for automated analysis pipelines.

Code Breakdown:

  1. Method Header:

    • Purpose: Introduce the start of the 'consensus' method implementation, clarifying the logic behind the consensus approach.
    if method == 'consensus':
       # the logic is that more than half of the tests need to give True for the consensus to be True
       variance_results = [variance_bool for variance_bool in variance_info.values()]
  2. Count True and False Results:

    • Purpose: Count the number of tests that conclude equal and unequal variances.
    true_count = variance_results.count(True)
    false_count = variance_results.count(False)
    half_point = 1.5 if len(variance_results) == 3 else 1
  3. Consensus Result Calculation:

    • Purpose: Determine the consensus result based on the majority of test outcomes.
    if true_count > half_point:
       consensus_percent = (true_count / len(variance_results)) * 100
       variance_consensus_text = f"  ➡ Result: Consensus is reached.\n  ➡ {consensus_percent}% of tests suggest equal variance between samples. *\n\n* Detailed results of each test are provided below.\n"
       variance_consensus = True
    
    elif true_count < half_point:
       consensus_percent = (false_count / len(variance_results)) * 100
       variance_consensus_text = f"  ➡ Result: Consensus is reached.\n  ➡ {consensus_percent}% of tests suggest unequal variance between samples. *\n\n* Detailed results of each test are provided below:\n"
       variance_consensus = False
    
    elif true_count == half_point:
       variance_consensus_text = f"  ➡ Result: Consensus is not reached.\n\n∴ Please refer to the results of each test below:\n"
       variance_consensus = variance_info['levene']
  4. Output Results:

    • Purpose: Display the console output and handle the return based on the pipeline flag.
    print(f"< VARIANCE TESTING: CONSENSUS >\nThe consensus method bases its conclusion on 2-3 tests: Levene test, Fligner-Killeen test, Bartlett test. (Note: More than 50% must have the same outcome to reach consensus.)\n\n{variance_consensus_text}")
    print(levene_title, levene_text, levene_tip)
    print(fligner_title, fligner_text, fligner_tip)
    print(bartlett_title, bartlett_text, bartlett_tip) if normality_info else f"\n\n< NOTE ON BARTLETT >\nBartlett was not used in consensus as no normality info has been provided or data is non-normal. Accuracy of Bartlett's test results rely heavily on normality."
    
    return output_info if not pipeline else variance_consensus

Link to Full Code: evaluate_variance.py