Implement new evaluate_normality() method: 'normaltest'

Title: Introducing D'Agostino-Pearson Normality Test for Enhanced Normality Assessment

Description: This update integrates the D'Agostino-Pearson normality test, also known as the normaltest, into the normality testing module. The D'Agostino-Pearson test combines skewness and kurtosis to form a robust test statistic, making it particularly suitable for assessing departures from normality in medium to large sample sizes. By incorporating the D'Agostino-Pearson test, users can achieve a more comprehensive evaluation of normality, especially in scenarios involving asymmetry and tail thickness.

Example Usage:

import pandas as pd
import numpy as np

# Load example dataset
data = {
    'Group': np.random.choice(['A', 'B', 'C'], 100),
    'Data': np.random.normal(0, 1, 100)
}
df = pd.DataFrame(data)

# Evaluate normality using the D'Agostino-Pearson normality test
normality_results = evaluate_normality(df, 'Data', 'Group', method='normaltest')

Expected Outcome: By leveraging the D'Agostino-Pearson normality test, users can obtain a comprehensive assessment of normality within different groups of data. This method combines skewness and kurtosis to form a test statistic, providing insights into departures from normality involving asymmetry and tail thickness. The integration of the D'Agostino-Pearson test enhances the normality testing module, enabling users to make more informed decisions in their data analysis processes.

Additional Context: The D'Agostino-Pearson normality test offers a balanced approach to assessing normality, balancing sensitivity and specificity in medium to large sample sizes. Its ability to detect departures from normality involving asymmetry and tail thickness makes it a valuable addition to the normality testing toolkit. This update enhances the functionality of the normality testing module, empowering users to conduct rigorous and reliable analyses of their data.

Implementation Summary:

The 'normaltest' method in the evaluate_normality() function uses D'Agostino and Pearson's normality test to evaluate the normality of a numeric variable within groups defined by a grouping variable. This test assesses whether the sample follows a normal distribution by considering the sample's skewness and kurtosis.

Code Breakdown:

Compute Normaltest Statistics and P-Values:
- Purpose: Calculate the normaltest statistic and p-values for each group.
```
normaltest_stats = [
   normaltest(df[df[grouping_variable] == group][target_variable]).statistic 
   for group in groups
]
normaltest_pvals = [
   normaltest(df[df[grouping_variable] == group][target_variable]).pvalue 
   for group in groups
]
```
- Explanation:
  - The code block iterates through each group, performs D'Agostino and Pearson's normality test on the target_variable, and stores the results.
  - The .statistic attribute provides the test statistic, and the .pvalue attribute gives the p-value.
Determine Normality:
- Purpose: Determine whether the variable in each group follows a normal distribution based on the normaltest results.
```
normaltest_normality = [True if p > 0.05 else False for p in normaltest_pvals]
```
- Explanation:
  - The code block checks the p-values for each group.
  - If the p-value is greater than 0.05, normality is assumed (True); otherwise, it's rejected (False).

Prepare Output:

Purpose: Format the output for each group's test results and prepare console output.

normaltest_info = {
   group: {'stat': normaltest_stats[n], 'p': normaltest_pvals[n], 'normality': normaltest_normality[n]} 
   for n, group in enumerate(groups)
}
normaltest_text = [
   f"Results for '{key}' group in variable ['{target_variable}']:\n  ➡ statistic: {value['stat']}\n  ➡ p-value: {value['p']}\n{(f'  ∴ Normality: Yes (H0 cannot be rejected)' if value['normality'] else f'  ∴ Normality: No (H0 rejected)')}\n\n"
   for key, value in normaltest_info.items()
]
normaltest_title = f"< NORMALITY TESTING: D'AGOSTINO-PEARSON NORMALTEST >\n\n"
normaltest_tip = "☻ Tip: The D'Agostino-Pearson normality test, or simply 'normaltest', is best applied when the sample size is larger, as it combines skewness and kurtosis to form a test statistic. This test is useful for detecting departures from normality that involve asymmetry and tail thickness, offering a good balance between sensitivity and specificity in medium to large sample sizes.\n"

Explanation:
- The dictionary normaltest_info holds the results for each group, with keys as group names and values as dictionaries containing the test statistic, p-value, and normality conclusion.
- The normaltest_text list formats these results for each group.
- normaltest_title and normaltest_tip are used for console output headers and tips.

Output Results and Return:
- Purpose: Output the results and return the appropriate values based on the pipeline parameter.
```
# saving info
output_info['normaltest'] = normaltest_info
normality_info['normaltest_group_consensus'] = all(normaltest_normality)

# end it here if non-consensus method
if method == 'normaltest':
   print(normaltest_title, *normaltest_text, normaltest_tip)
   return output_info if not pipeline else normality_info['normaltest_group_consensus']
```
- Explanation:
  - The results are saved in output_info and normality_info dictionaries.
  - If the method is 'normaltest' (and not 'consensus'), the function prints the console output and returns the appropriate results based on the pipeline flag.

Link to Full Code: evaluate_normality.py.

ETA444 / datasafari

Implement new evaluate_normality() method: 'normaltest' #74