Closed ETA444 closed 6 months ago
Implementation Summary:
The 'normaltest'
method in the evaluate_normality()
function uses D'Agostino and Pearson's normality test to evaluate the normality of a numeric variable within groups defined by a grouping variable. This test assesses whether the sample follows a normal distribution by considering the sample's skewness and kurtosis.
Code Breakdown:
Compute Normaltest Statistics and P-Values:
normaltest_stats = [
normaltest(df[df[grouping_variable] == group][target_variable]).statistic
for group in groups
]
normaltest_pvals = [
normaltest(df[df[grouping_variable] == group][target_variable]).pvalue
for group in groups
]
target_variable
, and stores the results..statistic
attribute provides the test statistic, and the .pvalue
attribute gives the p-value.Determine Normality:
normaltest_normality = [True if p > 0.05 else False for p in normaltest_pvals]
True
); otherwise, it's rejected (False
).Prepare Output:
normaltest_info = {
group: {'stat': normaltest_stats[n], 'p': normaltest_pvals[n], 'normality': normaltest_normality[n]}
for n, group in enumerate(groups)
}
normaltest_text = [
f"Results for '{key}' group in variable ['{target_variable}']:\n ➡ statistic: {value['stat']}\n ➡ p-value: {value['p']}\n{(f' ∴ Normality: Yes (H0 cannot be rejected)' if value['normality'] else f' ∴ Normality: No (H0 rejected)')}\n\n"
for key, value in normaltest_info.items()
]
normaltest_title = f"< NORMALITY TESTING: D'AGOSTINO-PEARSON NORMALTEST >\n\n"
normaltest_tip = "☻ Tip: The D'Agostino-Pearson normality test, or simply 'normaltest', is best applied when the sample size is larger, as it combines skewness and kurtosis to form a test statistic. This test is useful for detecting departures from normality that involve asymmetry and tail thickness, offering a good balance between sensitivity and specificity in medium to large sample sizes.\n"
normaltest_info
holds the results for each group, with keys as group names and values as dictionaries containing the test statistic, p-value, and normality conclusion.normaltest_text
list formats these results for each group.normaltest_title
and normaltest_tip
are used for console output headers and tips.Output Results and Return:
pipeline
parameter.# saving info
output_info['normaltest'] = normaltest_info
normality_info['normaltest_group_consensus'] = all(normaltest_normality)
# end it here if non-consensus method
if method == 'normaltest':
print(normaltest_title, *normaltest_text, normaltest_tip)
return output_info if not pipeline else normality_info['normaltest_group_consensus']
output_info
and normality_info
dictionaries.pipeline
flag.Link to Full Code: evaluate_normality.py.
Title: Introducing D'Agostino-Pearson Normality Test for Enhanced Normality Assessment
Description: This update integrates the D'Agostino-Pearson normality test, also known as the normaltest, into the normality testing module. The D'Agostino-Pearson test combines skewness and kurtosis to form a robust test statistic, making it particularly suitable for assessing departures from normality in medium to large sample sizes. By incorporating the D'Agostino-Pearson test, users can achieve a more comprehensive evaluation of normality, especially in scenarios involving asymmetry and tail thickness.
Example Usage:
Expected Outcome: By leveraging the D'Agostino-Pearson normality test, users can obtain a comprehensive assessment of normality within different groups of data. This method combines skewness and kurtosis to form a test statistic, providing insights into departures from normality involving asymmetry and tail thickness. The integration of the D'Agostino-Pearson test enhances the normality testing module, enabling users to make more informed decisions in their data analysis processes.
Additional Context: The D'Agostino-Pearson normality test offers a balanced approach to assessing normality, balancing sensitivity and specificity in medium to large sample sizes. Its ability to detect departures from normality involving asymmetry and tail thickness makes it a valuable addition to the normality testing toolkit. This update enhances the functionality of the normality testing module, empowering users to conduct rigorous and reliable analyses of their data.