Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

Add a bias detector based on optimal transport #434

Closed jmarecek closed 1 year ago

jmarecek commented 1 year ago

There is an increasing interest in estimating bias in terms of Wasserstein-2 distances (or related distances between measures, often computable using optimal-transport algorithms), cf. http://proceedings.mlr.press/v97/gordaliza19a.html (ICML 2019) https://academic.oup.com/imaiai/article-abstract/8/4/817/5586771 (Information and Inference, 2019) https://openreview.net/forum?id=-welFirjMss (NeurIPS 2022) Following some discussions with Rahul Nair (rahul.nair@ie.ibm.com), we would like to contribute a new detector, with the same signature as the original mdss detector of @Adebayo-Oshingbesan: https://github.com/Trusted-AI/AIF360/blob/master/aif360/detectors/mdss_detector.py

Could we ask @hoffmansc for a code review, please?

Closes #433

hoffmansc commented 1 year ago

Hi, @jmarecek. Thanks for your work!

I have a couple requests:

At a high level, though, there seems to be a lot of repeated code here. Is there a way we can reuse the MDSS bias scan code for this? For example, how does this differ from adding another "scoring function"? That seems much cleaner to me and would cut down considerably on the checklist above.

jmarecek commented 1 year ago

Hi @hoffmansc,

I hope all is well. Illia has pushed the unit tests, doc strings, etc. It would be great to revisit this.

I don't see how the MDSS scan could be extended to work with OT (or many other types of bias).

Illia and I would be happy to jump on a call, should this be faster?

Thanks! Jakub

rahulnair23 commented 1 year ago

Hi @jmarecek @Illia-Kryvoviaz - I've reviewed this as well and happy to have a call to discuss some details here. In addition to the points raised above by @hoffmansc (FYI - point 1 is still open, and points 2 and 3 have only been partially completed), please also consider:

Thank you.

Illia-Kryvoviaz commented 1 year ago

Hi @hoffmansc! We have taken into account your and @rahulnair23's suggestions, and now they have been implemented. Could we ask you for a code review, please?

https://github.com/Illia-Kryvoviaz/AIF360/blob/master/aif360/detectors/ot_detector.py

Illia-Kryvoviaz commented 1 year ago

Hello @hoffmansc! We updated the code based on your feedback. Could you please check out the changes and the argument ordering question?

As for detector vs metric, we argue that this should be an independent detector, rather than a metric, because it is too time-consuming to be a metric or run the subgroup analysis similar to MDSS.

hoffmansc commented 1 year ago

Thanks for your updates. I'm not sure I understand your point about detector vs. metric. It's fine if the metric is slow. The question is which classification better fits the algorithm and I think if it doesn't identify a subgroup like MDSS it shouldn't be called a detector. Metrics are simply anything which takes in data (predictions, ground truth, features, groups, etc.) and returns a numerical score (either per-group or "reduced" by taking the difference between groups, for example). Am I misinterpreting your algorithm?

jmarecek commented 1 year ago

Like I mentioned above, there are a variety of scoring functions within the "Optimal transport perspective", incl. https://en.wikipedia.org/wiki/Wasserstein_metric Wp for integers p, https://pythonot.github.io/auto_examples/gromov/plot_gromov.html for various integers and metrics on the underlying spaces, fused Gromov-Wasserstein variants, etc. So I don't really think it is a single scoring function.

hoffmansc commented 1 year ago

I understand there can be multiple distance/scoring functions. I'm not saying there needs to be any changes to the function just that it would be better suited under aif360.metrics instead of aif360.detectors. Even though it has options with respect to scoring function, the output is still a numerical value not a subgroup.

krvarshney commented 1 year ago

Sam, your logic for it to be under metrics makes sense to me.

jmarecek commented 1 year ago

Hi Kush -- so what about this: we contribute the OT-based metrics to aif360.metrics, and we add some "SimpleDetector" to detectors, which would not try to run the subgroup scan. That way, the current functionality of the code would be easy to access. Would that work? Jakub

rahulnair23 commented 1 year ago

Thanks for the call and discussion today @hoffmansc @jmarecek . As concluded, from an end-user point of view the OT-based measures are better suited under aif360.metrics (as opposed to aif360.detectors).

hoffmansc commented 1 year ago

Actually, to be clear, it's only strictly necessary to put it under aif360.sklearn.metrics. This is the most straightforward thing to implement. It is optional to put it under aif360.metrics (i.e., the class-based Metric interface). The scikit-learn style functions are preferred going forward and the Metric class is only supported for backwards compatibility.

Illia-Kryvoviaz commented 1 year ago

Hello @hoffmansc! We've put the ot_bias_scan under the aif360.sklearn.metrics. Additionally, we've renamed the ot_detector to ot_metric and moved the code from the aif360.detectors folder to the aif360.metrics folder - it would seem more logical to have it there now. Could you please review the changes? Thank you!

Illia-Kryvoviaz commented 1 year ago

Hello @hoffmansc! All changes were implemented as you suggested.

hoffmansc commented 1 year ago

@Illia-Kryvoviaz It looks like one of the tests is failing. This line should be pos_label=fav https://github.com/Trusted-AI/AIF360/blob/bb2ec68cd833e41fa73d06e20686cef0bc3eedef/tests/test_ot_metric.py#L136

Illia-Kryvoviaz commented 1 year ago

Yes, you are right @hoffmansc. Now it looks right.