ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement evaluate_contingency_table() enhanced functionality #85

Closed ETA444 closed 4 months ago

ETA444 commented 5 months ago

Title: Enhance Contingency Table Evaluation Function for Statistical Test Viability Assessment

Description: The evaluate_contingency_table function currently assesses contingency tables to determine the viability of various statistical tests, including chi-square tests and exact tests (Barnard's, Boschloo's, and Fisher's). This enhancement aims to improve the function's capability to guide the choice of appropriate statistical tests based on the characteristics of the contingency table, such as expected and observed frequencies, sample size, and table shape.

Proposed Changes:

Expected Outcome: With this enhancement, users will have a more comprehensive and informative tool for evaluating contingency tables and selecting appropriate statistical tests for hypothesis testing. It will streamline the process of assessing the viability of different tests based on the specific characteristics of the data.

Additional Context: Enhancing the functionality of the evaluate_contingency_table function contributes to our goal of providing robust and user-friendly statistical analysis tools. This improvement addresses a common need in data analysis workflows, enabling more informed decision-making and hypothesis testing in categorical data analysis scenarios.

ETA444 commented 4 months ago

Implementation Summary:

The evaluate_contingency_table() function evaluates a contingency table and determines which statistical tests can be appropriately applied based on the table's characteristics. The function examines expected and observed frequencies, sample size, and table shape to guide the choice of tests like the chi-square tests (with or without Yates' correction), Barnard's test, Boschloo's test, and Fisher's exact test.

Code Breakdown:

  1. Initial Setup and Error Handling:

    • Purpose: Check for proper input types and values.
    # Error Handling
    # TypeErrors
    if not isinstance(contingency_table, pd.DataFrame):
       raise TypeError("evaluate_contingency_table(): The 'contingency_table' parameter must be a pandas DataFrame.")
    
    if not isinstance(min_sample_size_yates, int):
       raise TypeError("evaluate_contingency_table(): The 'min_sample_size_yates' parameter must be an integer.")
    
    if not isinstance(pipeline, bool):
       raise TypeError("evaluate_contingency_table(): The 'pipeline' parameter must be a boolean.")
    
    if not isinstance(quiet, bool):
       raise TypeError("evaluate_contingency_table(): The 'quiet' parameter must be a boolean.")
    
    # ValueErrors
    if contingency_table.empty:
       raise ValueError("evaluate_contingency_table(): The 'contingency_table' parameter must not be empty.")
    
    if min_sample_size_yates <= 0:
       raise ValueError("evaluate_contingency_table(): The 'min_sample_size_yates' parameter must be a positive integer.")
    • Explanation:
      • The function first checks if the input types are correct.
      • It then checks that the contingency table is not empty and that min_sample_size_yates is positive.
  2. Calculation of Test Viability:

    • Purpose: Determine which tests can be applied based on the characteristics of the contingency table.
    # Main Function
    test_viability = {}  # non-pipeline output
    
    # compute objects for checks
    min_expected_frequency = expected_freq(contingency_table).min()
    min_observed_frequency = contingency_table.min().min()
    sample_size = np.sum(contingency_table.values)
    table_shape = contingency_table.shape
    • Explanation:
      • The function computes key characteristics of the contingency table to evaluate test viability.
      • expected_freq() is used to determine the minimum expected frequency.
      • The minimum observed frequency and sample size are also calculated.
    # assumption check for chi2_contingency test
    chi2_viability = True if min_expected_frequency >= 5 and min_observed_frequency >= 5 else False
    test_viability['chi2_contingency'] = chi2_viability
    
    # assumption check for chi2_contingency yate's-correction
    yates_correction_viability = True if table_shape == (2, 2) and sample_size < min_sample_size_yates else False
    test_viability['yates_correction'] = yates_correction_viability
    
    # assumption check for all exact tests
    barnard_viability, boschloo_viability, fisher_viability = (True, True, True) if table_shape == (2, 2) else (False, False, False)
    test_viability['barnard_exact'], test_viability['boschloo_exact'], test_viability['fisher_exact'] = barnard_viability, boschloo_viability, fisher_viability
    • Explanation:
      • The function checks the assumptions for the various tests.
      • Chi-square test (chi2_contingency):
      • Viable if minimum expected and observed frequencies are both at least 5.
      • Yates' Correction (yates_correction):
      • Viable for a 2x2 table with a sample size less than min_sample_size_yates.
      • Exact tests (barnard_exact, boschloo_exact, fisher_exact):
      • Viable only for 2x2 tables.
  3. Console Output and Return Value:

    • Purpose: Display the viability of each test and return the results.
    # console output
    title = f"< CONTINGENCY TABLE EVALUATION >\n"
    on_chi2 = f"Based on minimum expected freq. ({min_expected_frequency}) & minimum observed freq. ({min_observed_frequency}):\n  ➡ chi2_contingecy() viability: {'✔' if chi2_viability else '✘'}\n\n"
    on_yates = f"Based on table shape ({table_shape[0]}x{table_shape[1]}) & sample size ({sample_size}):\n  ➡ chi2_contingecy() Yate's correction viability: {'✔' if yates_correction_viability else '✘'}\n\n"
    on_exact = f"Based on table shape ({table_shape[0]}x{table_shape[1]}):\n  ➡ barnard_exact() viability: {'✔' if barnard_viability else '✘'}\n  ➡ boschloo_exact() viability: {'✔' if boschloo_viability else '✘'}\n  ➡ fisher_exact() viability: {'✔' if fisher_viability else '✘'}\n\n\n"
    print(title, on_chi2, on_yates, on_exact) if not quiet else ""
    • Explanation:
      • The function prepares a detailed output message indicating the viability of each test.
      • The results are displayed unless quiet is set to True.
    if pipeline:
       return chi2_viability, yates_correction_viability, barnard_viability, boschloo_viability, fisher_viability
    else:
       return test_viability
    • Explanation:
      • The function returns the viability results either as a tuple (for pipeline=True) or as a dictionary (for pipeline=False).

Link to Full Code: evaluate_contingency_table.py.