ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Write NumPy docstring for evaluate_normality() #79

Closed ETA444 closed 7 months ago

ETA444 commented 7 months ago

Written and accessible:

help(evaluate_normality)

This solution addresses the issue "Write NumPy docstring for evaluate_normality()" by providing a detailed NumPy-style docstring for the evaluate_normality() function.

Summary:

The function evaluate_normality() tests the normality of a numeric variable within groups defined by a grouping variable. The updated docstring follows the NumPy format and includes details on the parameters, return values, exceptions, and examples.

Docstring Sections Preview:

Description

"""
Evaluates the normality of a numeric variable within groups defined by a grouping variable.

This function offers a comprehensive approach to testing the normality of a distribution within subsets of data. It supports multiple statistical tests and a consensus method that combines the results of all tests to determine normality. It's designed to be flexible for use both as a standalone function and as part of a larger pipeline for hypothesis testing.
"""

Parameters

"""
Parameters
----------
df : pd.DataFrame
    The DataFrame containing the data to be tested.
target_variable : str
    The name of the numeric variable to test for normality.
grouping_variable : str
    The name of the categorical variable used to create subsets of data for normality testing.
method : str, optional
    The method to use for testing normality. Options include:
        - 'shapiro': Shapiro-Wilk test
        - 'anderson': Anderson-Darling test
        - 'normaltest': D'Agostino and Pearson's test
        - 'lilliefors': Lilliefors test
        - 'consensus': A combination of the above tests, defaulting to consensus if normality is indicated by the majority.
    Default is 'consensus'.
pipeline : bool, optional
    If True, the function returns a simple boolean indicating normality instead of detailed test results. Useful for integrating with other testing pipelines. Default is False.
"""

Returns

"""
Returns
-------
output_info : dict or bool
    - If `pipeline` is False, returns a dictionary with test names as keys and test results, including statistics, p-values, and normality conclusions, as values.
    - If `pipeline` is True, returns a boolean indicating the consensus on normality across all tests, or if consensus method was not used a boolean indicating the result of that test.
"""

Raises

"""
Raises
------
TypeError
    - If `df` is not a pandas DataFrame.
    - If `target_variable` or `grouping_variable` is not a string.
    - If `method` is not a string.
    - If `pipeline` is not a boolean.
ValueError
    - If the `df` is empty, indicating that there's no data to evaluate.
    - If the `target_variable` or `grouping_variable` does not exist in the DataFrame.
    - If the `method` specified is not supported. Allowed methods are: 'shapiro', 'anderson', 'normaltest', 'lilliefors', 'consensus'.
    - If the `target_variable` is not numerical.
    - If the `grouping_variable` is not categorical.
"""

Examples

"""
Examples
--------
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'Group': np.random.choice(['A', 'B', 'C'], 100), 'Data': np.random.normal(0, 1, 100)})
# Using the most robust method 'consensus'
>>> result_dictionary = evaluate_normality(df, 'Data', 'Group')
# Focusing on using 'shapiro'
>>> evaluate_normality(df, 'Data', 'Group', method='shapiro')
# Integrating the function into your own pipeline
>>> normality = evaluate_normality(df, 'Data', 'Group', pipeline=True)
>>> if normality:
>>>     # ...
"""