Closed ETA444 closed 6 months ago
Implementation Summary:
The 'anderson'
method in the evaluate_normality()
function utilizes the Anderson-Darling test to evaluate the normality of a numeric variable within groups defined by a grouping variable. The Anderson-Darling test is useful for detecting deviations from normality, particularly in the tails of the distribution, making it suitable for assessing outliers or heavy-tailed distributions.
Code Breakdown:
Compute Anderson-Darling Statistic and Critical Values:
anderson_stats = [
anderson(df[df[grouping_variable] == group][target_variable]).statistic
for group in groups
]
anderson_critical_values = [
anderson(df[df[grouping_variable] == group][target_variable]).critical_values[2]
for group in groups
]
target_variable
, and stores the results..statistic
attribute gives the test statistic..critical_values
attribute provides the critical values; the index [2]
corresponds to a significance level of 0.05.Determine Normality:
anderson_normality = [
True if c_val > anderson_stats[n] else False
for n, c_val in enumerate(anderson_critical_values)
]
True
); otherwise, it's rejected (False
).Prepare Output:
anderson_info = {
group: {'stat': anderson_stats[n], 'p': anderson_critical_values[n], 'normality': anderson_normality[n]}
for n, group in enumerate(groups)
}
anderson_text = [
f"Results for '{key}' group in variable ['{target_variable}']:\n ➡ statistic: {value['stat']}\n ➡ p-value: {value['p']}\n{(f' ∴ Normality: Yes (H0 cannot be rejected)' if value['normality'] else f' ∴ Normality: No (H0 rejected)')}\n\n"
for key, value in anderson_info.items()
]
anderson_title = f"< NORMALITY TESTING: ANDERSON-DARLING >\n\n"
anderson_tip = "☻ Tip: The Anderson-Darling test is a versatile test that can be applied to any sample size and is especially useful for comparing against multiple distribution types, not just the normal. It places more emphasis on the tails of the distribution than the Shapiro-Wilk test, making it useful for detecting outliers or heavy-tailed distributions.\n"
anderson_info
holds the results for each group, with keys as group names and values as dictionaries containing the test statistic, critical value, and normality conclusion.anderson_text
list formats these results for each group.anderson_title
and anderson_tip
are used for console output headers and tips.Output Results and Return:
pipeline
parameter.# saving info
output_info['anderson'] = anderson_info
normality_info['anderson_group_consensus'] = all(anderson_normality)
# end it here if non-consensus method
if method == 'anderson':
print(anderson_title, *anderson_text, anderson_tip)
return output_info if not pipeline else normality_info['anderson_group_consensus']
output_info
and normality_info
dictionaries.pipeline
flag.Link to Full Code: evaluate_normality.py.
Title: Enhancing Normality Testing with Anderson-Darling Method
Description: This update introduces the Anderson-Darling test as a new method for evaluating the normality of a distribution within groups defined by a categorical variable. The Anderson-Darling test is known for its versatility and ability to assess normality across various sample sizes, making it a valuable addition to the normality testing module. By incorporating the Anderson-Darling test, users can gain insights into the distribution tails and detect outliers or heavy-tailed distributions more effectively.
Example Usage:
Expected Outcome: By leveraging the Anderson-Darling test, users can obtain a comprehensive assessment of normality within different groups of data. This method provides insights into the distribution tails and is particularly useful for detecting outliers or heavy-tailed distributions. The incorporation of the Anderson-Darling test enhances the versatility and effectiveness of the normality testing module.
Additional Context: The Anderson-Darling test complements existing normality testing methods by offering a robust approach to assessing distribution normality. Its emphasis on distribution tails makes it suitable for a wide range of distributions and sample sizes, contributing to more accurate and reliable statistical analyses. This update enhances the functionality of the normality testing module, empowering users to make informed decisions in their data analysis processes.