Open riedgar-ms opened 3 years ago
Hi, I'm interested in contributing to the project. Is a good issue for someone new, or is it too involved?
Hi @shngt ! Welcome to the Fairlearn community! I think the currently suggested change in this issue is to create a separate ThresholdOptimizer
instance per control feature value, e.g. if we have three of them (high
, medium
, and low
from the feature income
) then we'd pass only the subset of X
and y
corresponding to the control feature value to each instance. That sounds simple enough to do. However, the potentially tricky part is the structure (more below). I think this is a fine issue to start on once we sort out what that should look like. [I was planning to open a few other issues that would be great starting points, so I'll do that now to give you some more choices. I'd be happy to walk you through any of them if you have questions.]
Re: structure --> Since @MiroDudik 's change a little while ago we already have a fairly nice separation of logic in ThresholdOptimizer
and InterpolatedThresholder
. At first glance this should be as simple as replacing this line with a for-loop to do the same with each control feature, and then repeating that process for the prediction methods. Alternatively, if no control features are specified, one would just use the entire dataset. The passing of control_features
is currently a bit obscure since it's not explicitly spelled out in the API. I'm opening a "discussion" (https://github.com/fairlearn/fairlearn/discussions/664) for that, but it is certainly possible to pass control_features
even now. For ThresholdOptimizer
this currently results in an exception (by design since that's what this issue is for).
@MiroDudik and @riedgar-ms may have thoughts on this, and perhaps @adrinjalali since we've talked about ThresholdOptimizer
code before. Obviously everyone's feedback would be highly appreciated.
We should also probably have a bunch of validation and maybe configurable warnings for the cases where due to control_features
each estimator would receive only a handful of samples and therefore would not be a suitable estimator.
Is your feature request related to a problem? Please describe.
While we have
control_features
for reductions, they are not implemented forThresholdOptimizer
Describe the solution you'd like
Add them as an option. Basically, a separate optimiser would need to be constructed for each control feature, and it would have to be available at scoring time too.
Describe alternatives you've considered, if relevant
N/A
Additional context
N/A