ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Write NumPy docstring for calculate_entropy() #16

Closed ETA444 closed 7 months ago

ETA444 commented 7 months ago

Written and accessible:

help(calculate_entropy)

This solution addresses the issue "Write NumPy docstring for calculate_entropy()" by providing a detailed NumPy-style docstring for the calculate_entropy() function.

Summary:

The function calculate_entropy() calculates the entropy of a pandas Series (categorical variable) and provides an interpretation by comparing it to the maximum possible entropy for the given number of unique categories. The updated docstring follows the NumPy format and includes details on the parameters, return values, and examples.

Docstring Sections Preview:

Description

"""
Calculate the entropy of a pandas Series (categorical variable) and provide an interpretation by comparing it to
the maximum possible entropy for the given number of unique categories. This function quantifies the
unpredictability or diversity of the data within the variable.

**Calculation:**
    - Entropy is computed using the formula: *H = -sum(p_i * log2(p_i))*
    - Where p_i is the proportion of the series belonging to the ith category. Higher entropy indicates a more uniform distribution of categories, suggesting a higher degree of randomness or diversity. Conversely, entropy is lower when one or a few categories dominate the distribution.

**Interpretation:**
    The function also calculates the percentage of the maximum possible entropy (based on the number
    of unique categories) achieved by the actual entropy, providing context for how diverse the
    categorical data is relative to the total number of categories.
"""

Parameters

"""
Parameters
----------
series : pd.Series
    The series for which to calculate entropy. Should be a categorical variable with discrete values.
"""

Returns

"""
Returns
-------
tuple
    A tuple containing two elements:
        - A float representing the entropy of the series. Higher values indicate greater entropy (diversity), and values close to zero suggest less entropy (more uniformity).
        - A string providing an interpretation of the entropy value in the context of the maximum possible entropy for the given number of unique categories.
"""

Examples

"""
Example
-------
>>> import pandas as pd
>>> example_series = pd.Series(['apple', 'orange', 'apple', 'banana', 'orange', 'banana'])
>>> entropy_val_example, interpretation_example = calculate_entropy(example_series)
>>> print(entropy_val_example)
1.584962500721156
>>> print(interpretation_example)
"=> Moderate diversity [max. entropy for this variable = 3.584962500721156]"
"""