Implement evaluate_contingency_table() enhanced functionality

Title: Enhance Contingency Table Evaluation Function for Statistical Test Viability Assessment

Description: The evaluate_contingency_table function currently assesses contingency tables to determine the viability of various statistical tests, including chi-square tests and exact tests (Barnard's, Boschloo's, and Fisher's). This enhancement aims to improve the function's capability to guide the choice of appropriate statistical tests based on the characteristics of the contingency table, such as expected and observed frequencies, sample size, and table shape.

Proposed Changes:

Introduce logic to assess the suitability of chi-square tests and exact tests based on minimum expected and observed frequencies.
Implement a parameter to consider Yates' correction within the chi-square test for contingency tables of specific shapes and sample sizes.
Enhance error handling to provide informative messages for invalid inputs or edge cases.
Improve console output to summarize the viability of each statistical test based on the contingency table characteristics.

Expected Outcome: With this enhancement, users will have a more comprehensive and informative tool for evaluating contingency tables and selecting appropriate statistical tests for hypothesis testing. It will streamline the process of assessing the viability of different tests based on the specific characteristics of the data.

Additional Context: Enhancing the functionality of the evaluate_contingency_table function contributes to our goal of providing robust and user-friendly statistical analysis tools. This improvement addresses a common need in data analysis workflows, enabling more informed decision-making and hypothesis testing in categorical data analysis scenarios.

Implementation Summary:

The evaluate_contingency_table() function evaluates a contingency table and determines which statistical tests can be appropriately applied based on the table's characteristics. The function examines expected and observed frequencies, sample size, and table shape to guide the choice of tests like the chi-square tests (with or without Yates' correction), Barnard's test, Boschloo's test, and Fisher's exact test.

Code Breakdown:

Initial Setup and Error Handling:

Purpose: Check for proper input types and values.

# Error Handling
# TypeErrors
if not isinstance(contingency_table, pd.DataFrame):
   raise TypeError("evaluate_contingency_table(): The 'contingency_table' parameter must be a pandas DataFrame.")

if not isinstance(min_sample_size_yates, int):
   raise TypeError("evaluate_contingency_table(): The 'min_sample_size_yates' parameter must be an integer.")

if not isinstance(pipeline, bool):
   raise TypeError("evaluate_contingency_table(): The 'pipeline' parameter must be a boolean.")

if not isinstance(quiet, bool):
   raise TypeError("evaluate_contingency_table(): The 'quiet' parameter must be a boolean.")

# ValueErrors
if contingency_table.empty:
   raise ValueError("evaluate_contingency_table(): The 'contingency_table' parameter must not be empty.")

if min_sample_size_yates <= 0:
   raise ValueError("evaluate_contingency_table(): The 'min_sample_size_yates' parameter must be a positive integer.")

Explanation:
- The function first checks if the input types are correct.
- It then checks that the contingency table is not empty and that min_sample_size_yates is positive.

Calculation of Test Viability:

Purpose: Determine which tests can be applied based on the characteristics of the contingency table.

# Main Function
test_viability = {}  # non-pipeline output

# compute objects for checks
min_expected_frequency = expected_freq(contingency_table).min()
min_observed_frequency = contingency_table.min().min()
sample_size = np.sum(contingency_table.values)
table_shape = contingency_table.shape

Explanation:
- The function computes key characteristics of the contingency table to evaluate test viability.
- expected_freq() is used to determine the minimum expected frequency.
- The minimum observed frequency and sample size are also calculated.

# assumption check for chi2_contingency test
chi2_viability = True if min_expected_frequency >= 5 and min_observed_frequency >= 5 else False
test_viability['chi2_contingency'] = chi2_viability

# assumption check for chi2_contingency yate's-correction
yates_correction_viability = True if table_shape == (2, 2) and sample_size < min_sample_size_yates else False
test_viability['yates_correction'] = yates_correction_viability

# assumption check for all exact tests
barnard_viability, boschloo_viability, fisher_viability = (True, True, True) if table_shape == (2, 2) else (False, False, False)
test_viability['barnard_exact'], test_viability['boschloo_exact'], test_viability['fisher_exact'] = barnard_viability, boschloo_viability, fisher_viability

Explanation:
- The function checks the assumptions for the various tests.
- Chi-square test (chi2_contingency):
- Viable if minimum expected and observed frequencies are both at least 5.
- Yates' Correction (yates_correction):
- Viable for a 2x2 table with a sample size less than min_sample_size_yates.
- Exact tests (barnard_exact, boschloo_exact, fisher_exact):
- Viable only for 2x2 tables.

Console Output and Return Value:

Purpose: Display the viability of each test and return the results.

# console output
title = f"< CONTINGENCY TABLE EVALUATION >\n"
on_chi2 = f"Based on minimum expected freq. ({min_expected_frequency}) & minimum observed freq. ({min_observed_frequency}):\n  ➡ chi2_contingecy() viability: {'✔' if chi2_viability else '✘'}\n\n"
on_yates = f"Based on table shape ({table_shape[0]}x{table_shape[1]}) & sample size ({sample_size}):\n  ➡ chi2_contingecy() Yate's correction viability: {'✔' if yates_correction_viability else '✘'}\n\n"
on_exact = f"Based on table shape ({table_shape[0]}x{table_shape[1]}):\n  ➡ barnard_exact() viability: {'✔' if barnard_viability else '✘'}\n  ➡ boschloo_exact() viability: {'✔' if boschloo_viability else '✘'}\n  ➡ fisher_exact() viability: {'✔' if fisher_viability else '✘'}\n\n\n"
print(title, on_chi2, on_yates, on_exact) if not quiet else ""

Explanation:
- The function prepares a detailed output message indicating the viability of each test.
- The results are displayed unless quiet is set to True.

if pipeline:
   return chi2_viability, yates_correction_viability, barnard_viability, boschloo_viability, fisher_viability
else:
   return test_viability

Explanation:
- The function returns the viability results either as a tuple (for pipeline=True) or as a dictionary (for pipeline=False).

Link to Full Code: evaluate_contingency_table.py.

ETA444 / datasafari

Implement evaluate_contingency_table() enhanced functionality #85