fairlearn / fairlearn

A Python package to assess and improve fairness of machine learning models.
https://fairlearn.org
MIT License
1.94k stars 419 forks source link

Control features for Postprocessing #660

Open riedgar-ms opened 3 years ago

riedgar-ms commented 3 years ago

Is your feature request related to a problem? Please describe.

While we have control_features for reductions, they are not implemented for ThresholdOptimizer

Describe the solution you'd like

Add them as an option. Basically, a separate optimiser would need to be constructed for each control feature, and it would have to be available at scoring time too.

Describe alternatives you've considered, if relevant

N/A

Additional context

N/A

shngt commented 3 years ago

Hi, I'm interested in contributing to the project. Is a good issue for someone new, or is it too involved?

romanlutz commented 3 years ago

Hi @shngt ! Welcome to the Fairlearn community! I think the currently suggested change in this issue is to create a separate ThresholdOptimizer instance per control feature value, e.g. if we have three of them (high, medium, and low from the feature income) then we'd pass only the subset of X and y corresponding to the control feature value to each instance. That sounds simple enough to do. However, the potentially tricky part is the structure (more below). I think this is a fine issue to start on once we sort out what that should look like. [I was planning to open a few other issues that would be great starting points, so I'll do that now to give you some more choices. I'd be happy to walk you through any of them if you have questions.]

Re: structure --> Since @MiroDudik 's change a little while ago we already have a fairly nice separation of logic in ThresholdOptimizer and InterpolatedThresholder. At first glance this should be as simple as replacing this line with a for-loop to do the same with each control feature, and then repeating that process for the prediction methods. Alternatively, if no control features are specified, one would just use the entire dataset. The passing of control_features is currently a bit obscure since it's not explicitly spelled out in the API. I'm opening a "discussion" (https://github.com/fairlearn/fairlearn/discussions/664) for that, but it is certainly possible to pass control_features even now. For ThresholdOptimizer this currently results in an exception (by design since that's what this issue is for).

@MiroDudik and @riedgar-ms may have thoughts on this, and perhaps @adrinjalali since we've talked about ThresholdOptimizer code before. Obviously everyone's feedback would be highly appreciated.

adrinjalali commented 3 years ago

We should also probably have a bunch of validation and maybe configurable warnings for the cases where due to control_features each estimator would receive only a handful of samples and therefore would not be a suitable estimator.