Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

Zero denominator causes RejectOptionClassification.fit() to fail #307

Open gabastil opened 2 years ago

gabastil commented 2 years ago

In the case of a zero in the denominator of the rate functions (e.g., true_negative_rate), the fit function throws an error.

In my case, the classified_transf_metric.true_negative_rate() function in lines 143 and 144 of aif360/algorithms/postprocessing/reject_option_classification.py causes the entire expression to return as nan as it returns nan in the addition operation that is assigned to balanced_acc_arr[cnt].

The above, in turn, causes downstream operations to throw errors.

I propose returning zero and sending a warning message that this has occurred, but not sure if that is the best route forward for this issue.

image
krvarshney commented 2 years ago

I don't think that returning 0 when there is a divide by zero error is the correct solution. https://en.wikipedia.org/wiki/Division_by_zero

gabastil commented 2 years ago

Sure, it's not technically, mathematically the correct solution—that should be inf, right?

However, in a context by context basis, something needs to returned that does not collapse the entire fit process. Numpy's solution is to default all nan values to zero (see numpy.nan_to_num). As this is an array of accuracy, when there are no TN counts, any predicted negatives would be "inaccurate", which would be the reasoning for returning a zero.

That thought process in addition to the nan handling function in numpy led me to that proposed workaround.

If there's a better solution, happy to have that implemented as well. Otherwise, I can submit a PR for what I mentioned.

As long as the nan issue arising from fit() does not derail the entire pipeline, I am happy!

hoffmansc commented 2 years ago

It looks like this is what sklearn does (e.g. recall_score) and what we do in the sklearn-compatible metrics as well.

However, it seems like a really breaking issue if your dataset has no positive or no negative samples at all. What's the point of running a debiasing algorithm on such a dataset?

gabastil commented 2 years ago

The dataset does have positive or negative samples, but I see where you are coming from.

Regardless, even if it's "failing gracefully" like a helpful message describing the error (i.e., a division by zero error at point of division), it'd be helpful to see that.

Even better, before even getting to the loop, if it's known that metrics cannot be calculated properly, an assert statement with a helpful message might be more helpful than letting both classification_threshold and ROC_margin loops run through knowing that nan values they are generating will fail the function.