Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

Theil Index using binary prediction #273

Open rocalabern opened 3 years ago

rocalabern commented 3 years ago

According to Theil index definition, the formula would be undertemined when y_score is 0 and y_target is 1.

Since b_i = y_score - y_target + 1 would give b_i = 0 And then log(0) not defined.

Shouldn't the package have a class which does not use BinaryLabelDataset for predictions, in order to calculate a Theil Indez with a continous score instead of a binary score?

I assume it is using the trick a*log(a) is equals to log(a**a), but not sure Theil Index is supposed to be calculated that way.

I did not expected this implementation, looks like a trick to be able to have the same interface for all metrics, instead of the proper calculation.

Sorry for my lack of knowledge in the subject if that is supposed to be the proper calculation.

https://github.com/Trusted-AI/AIF360/blob/746e763191ef46ba3ab5c601b96ce3f6dcb772fd/aif360/metrics/classification_metric.py#L694

krvarshney commented 3 years ago

The Theil Index is the Generalized Entropy Index with alpha = 1. The Generalized Entropy Index is implemented in the standard way using the equation https://wikimedia.org/api/rest_v1/media/math/render/svg/844bcf9d016e032d7e03f0de29b7733e36f0b8a9.

rocalabern commented 3 years ago

Ok, thanks.

But the interface forcus only on the binary classification. I would like to extend Theil Index for multiclass or regression. Actually, I woud like to use probabilities instead of binary predictions.

"In this paper, we will focus on binary classification, but our work extends to multiclass classification and regression, as well." A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices

You can close the issue if this does not fit the package, or it is not considered, thanks for the reply

krvarshney commented 3 years ago

Extending for multiclass and/or regression will be great. Please go ahead. We can discuss your solution through a pull request once you've made it.