dssg / aequitas

Bias Auditing & Fair ML Toolkit
http://www.datasciencepublicpolicy.org/aequitas/
MIT License
656 stars 110 forks source link

Enhance Robustness by Handling Missing Values and Group-wise Calculation of PPR #192

Open lshpaner opened 1 month ago

lshpaner commented 1 month ago

Background

The Aequitas library is used for auditing bias and fairness in machine learning models. One key metric it computes is the total number of predicted positives (k), crucial for further fairness metrics calculations.

Issue

Currently, the computation of k on line 130 of group.py assumes there are no missing values in the predictions. This leads to inaccurate calculations if the data contains missing values. Additionally, the current method calculates k across all groups together, which is done on line 164. This method might mask disparities in the predicted positives across different demographic groups.

Suggested Improvement

It would be beneficial to handle missing values explicitly, either by excluding them with a warning or by offering an option to impute them based on user preference. Furthermore, calculating k separately for each group and then summing these values can provide a clearer view of model behavior across different groups. This approach would enhance the transparency and utility of the fairness assessment.

Below is a proposed change in the calculation method:


# Proposed method to calculate k group-wise and handle missing values
grouped = df.groupby('group')
k_per_group = grouped.apply(lambda x: x[x[score] == 1].dropna().shape[0])
total_k = k_per_group.sum()