ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Write NumPy docstring for evaluate_contingency_table() #87

Closed ETA444 closed 4 months ago

ETA444 commented 4 months ago

Written and accessible:

help(evaluate_contingency_table)

This solution addresses the issue "Write NumPy docstring for evaluate_contingency_table()" by providing a detailed NumPy-style docstring for the evaluate_contingency_table() function.

Summary:

The function evaluate_contingency_table() evaluates a contingency table to determine the viability of various statistical tests based on the table's characteristics. It assesses the table's suitability for chi-square tests, exact tests (Barnard's, Boschloo's, and Fisher's), and the application of Yates' correction within the chi-square test. The docstring follows the NumPy format and includes details on the parameters, return values, exceptions, and examples.

Docstring Sections Preview:

Description

"""
Evaluates a contingency table to determine the viability of various statistical tests based on the table's characteristics.

This function assesses the contingency table's suitability for chi-square tests, exact tests (Barnard's, Boschloo's, and Fisher's), and the application of Yates' correction within the chi-square test. It examines expected and observed frequencies, sample size, and table shape to guide the choice of appropriate statistical tests for hypothesis testing.
"""

Parameters

"""
Parameters
----------
contingency_table : pd.DataFrame
    A contingency table generated from two categorical variables.
min_sample_size_yates : int, optional
    The minimum sample size below which Yates' correction should be considered. Default is 40.
pipeline : bool, optional
    Determines the format of the output. If True, outputs a tuple of boolean values representing the viability of each test. If False, outputs a dictionary with the test names as keys and their viabilities as boolean values. Default is False.
quiet : bool, optional
    A parameter to control verbosity. Default is False.
"""

Returns

"""
Returns
-------
test_viability : dict or tuple
    Depending on the 'pipeline' parameter:
        - If `pipeline` is False, returns a dictionary with keys as test names ('chi2_contingency', 'yates_correction', 'barnard_exact', 'boschloo_exact', 'fisher_exact') and values as boolean indicators of their viability.
        - If `pipeline` is True, returns a tuple of boolean values in the order: (chi2_viability, yates_correction_viability, barnard_viability, boschloo_viability, fisher_viability).
"""

Raises

"""
Raises
------
TypeError
    - If `contingency_table` is not a pandas DataFrame, indicating the wrong data type has been passed.
    - If `min_sample_size_yates` is not an integer, indicating the parameter is of the wrong type.
    - If `pipeline` or `quiet` is not a boolean, indicating incorrect data types for these parameters.
ValueError
    - If the `contingency_table` is empty, indicating that there's no data to evaluate.
    - If `min_sample_size_yates` is not a positive integer, indicating an invalid parameter value.
"""

Examples

"""
Examples
--------
>>> import pandas as pd
>>> import numpy as np
>>> data = {
...     'Gender': np.random.choice(['Male', 'Female'], 100),
...     'Preference': np.random.choice(['Option A', 'Option B'], 100)
... }
>>> df_example = pd.DataFrame(data)
>>> contingency_table = pd.crosstab(df_example['Gender'], df_example['Preference'])
>>> test_viability = evaluate_contingency_table(contingency_table)
>>> print(test_viability)
>>> chi2, yates, barnard, boschloo, fisher = evaluate_contingency_table(contingency_table, pipeline=True, quiet=True)
>>> if chi2:
>>>     # ...
"""