This solution addresses the issue "Write NumPy docstring for explore_cat()" by providing a detailed NumPy-style docstring for the explore_cat() function.
Summary:
The function explore_cat() explores categorical variables within a DataFrame, providing insights through various methods. The updated docstring follows the NumPy format and includes details on the parameters, return values, exceptions, and examples.
Docstring Sections Preview:
Description
"""
Explores categorical variables within a DataFrame, providing insights through various methods. The exploration
can yield unique values, counts and percentages of those values, and the entropy to quantify data diversity.
"""
Parameters
"""
Parameters
----------
df : pd.DataFrame
The DataFrame containing the data to be explored.
categorical_variables : list
A list of strings specifying the names of the categorical columns to explore.
method : str, default 'all'
Specifies the method of exploration to apply. Options include:
- 'unique_values': Lists unique values for each specified categorical variable.
- 'counts_percentage': Shows counts and percentages for the unique values of each variable.
- 'entropy': Calculates the entropy for each variable, providing a measure of data diversity. See the 'calculate_entropy' function for more details on entropy calculation.
- 'all': Applies all the above methods sequentially.
output : str, default 'print'
Determines how the exploration results are outputted. Options are:
- 'print': Prints the results to the console.
- 'return': Returns the results as a single formatted string.
"""
Returns
"""
Returns
-------
str or None
- If output='return', a string containing the formatted exploration results is returned.
- If output='print', results are printed to the console, and the function returns None.
"""
Raises
"""
Raises
------
TypeError
- If `df` is not a pandas DataFrame.
- If `categorical_variables` is not a list or contains non-string elements.
- If `method` or `output` is not a string.
ValueError
- If the `df` is empty, indicating that there's no data to evaluate.
- If `method` is not one of the valid options ('unique_values', 'counts_percentage', 'entropy', 'all').
- If `output` is not one of the valid options ('print', 'return').
- If 'categorical_variables' list is empty.
- If variables provided through 'categorical_variables' are not categorical variables.
- If any of the specified categorical variables are not found in the DataFrame.
"""
Examples
"""
Examples
--------
# Create a sample DataFrame to use in the examples:
>>> import numpy as np
>>> import pandas as pd
>>> data = {
... 'Category1': np.random.choice(['Apple', 'Banana', 'Cherry'], size=100),
... 'Category2': np.random.choice(['Yes', 'No'], size=100),
... 'Category3': np.random.choice(['Low', 'Medium', 'High'], size=100)
... }
>>> df = pd.DataFrame(data)
# Display unique values for 'Category1' and 'Category2'
>>> explore_cat(df, ['Category1', 'Category2'], method='unique_values', output='print')
# Explore counts and percentages for 'Category1' and 'Category2', then print the results
>>> explore_cat(df, ['Category1', 'Category2'], method='counts_percentage', output='print')
# Calculate and return the entropy of 'Category1', 'Category2', and 'Category3'
>>> result = explore_cat(df, ['Category1', 'Category2', 'Category3'], method='entropy', output='return')
>>> print(result)
# Comprehensive exploration of all specified methods for 'Category1', 'Category2', and 'Category3', displaying to console
>>> explore_cat(df, ['Category1', 'Category2', 'Category3'], method='all', output='print')
# Using 'all' method to explore 'Category1' and 'Category2', returning the results as a string
>>> result_str = explore_cat(df, ['Category1', 'Category2'], method='all', output='return')
>>> print(result_str)
"""
Notes
"""
Notes
-----
The 'entropy' method provides a quantitative measure of the unpredictability or
diversity within each specified categorical column, calculated as outlined in the
documentation for 'calculate_entropy'. High entropy values indicate a more uniform
distribution of categories, suggesting no single category overwhelmingly dominates.
"""
Written and accessible:
This solution addresses the issue "Write NumPy docstring for explore_cat()" by providing a detailed NumPy-style docstring for the
explore_cat()
function.Summary:
The function
explore_cat()
explores categorical variables within a DataFrame, providing insights through various methods. The updated docstring follows the NumPy format and includes details on the parameters, return values, exceptions, and examples.Docstring Sections Preview:
Description
Parameters
Returns
Raises
Examples
Notes