fairlearn / fairlearn-proposals

Proposal Documents for Fairlearn
MIT License
9 stars 6 forks source link

Reductions - Unification and Simplification #4

Closed riedgar-ms closed 4 years ago

riedgar-ms commented 4 years ago

Currently ExponentiatedGradient and GridSearch implement their own 'result' types. However, there is a lot of commonality, which should be separated into an OracleCall object, which stores a trained model, and the associated lambda_vec. The goal would be to be able to take an OracleCall from the ensemble and expand a grid around it for GridSearch.

Also, it does not make sense to have an ExponentiatedGradientResult property on the ExponentiatedGradient object. There is only one result for that algorithm, so the necessary bits and pieces should be directly on the ExponentiatedGradient class.

MiroDudik commented 4 years ago

In particular, the fitted properties should be stored in the estimator object and follow the naming convention of https://scikit-learn.org/stable/developers/develop.html

adrinjalali commented 4 years ago

Also note that whenever the result is a complicated object, we usually return a dict or a Bunch object in sklearn.

romanlutz commented 4 years ago

expgrad result frames everything in terms of final result:

expgrad keeps the result object around through self._expgrad_result

grid search frames everything in terms of results per "grid point" since it's not sequential:

All the grid search result objects go into self._all_results which is a list.

Design considerations:

As Miro pointed out a couple of days ago result objects as such may not be suitable. Maybe it's better to just have everything as a property on the resulting predictor? In this case we'd have many instances of the various properties, such as lambda vec, so we'd probably need a list or dictionary. For interoperability it would probably easier to have everything grouped together and "plug" it into the other method that way, as opposed to having to find and use each individual property. The same thought somewhat more concrete:

expgrad = ExponentiatedGradient(...)
expgrad.fit(....)
grid_search = GridSearch(expgrad.result, ...)
grid_search.fit(....)

vs.

expgrad = ExponentiatedGradient(...)
expgrad.fit(....)
grid_search = GridSearch(expgrad.property1, expgrad.property2, expgrad.property3, expgrad.property4, ...)
grid_search.fit(....)

The proposal:

and both methods need to have the following properties:

Please shout if there are things I'm forgetting, or if there's anything objectionable. If you have feedback (or just better ideas!) just post them below ;-)

@adrinjalali @MiroDudik @riedgar-ms @vingu

adrinjalali commented 4 years ago

Looking at the ExponentiatedGradientResult, I really think Bunch is what you can use instead. That's what we use in several places, and satisfies pretty much all your needs for the results.

I also just realized Reduction doesn't really have anything. Why is it its own class? It seems like instead of inheriting from Reduction, those classes could subclass BaseEstimator and ClassifierMixin or RegressionMixin, unless I'm missing something.

romanlutz commented 4 years ago

Finally getting to this! Will do it step by step and see what makes sense for the "result objects". There's an upcoming PR for using the mixins.

romanlutz commented 4 years ago

The mixins PR is completed, so that's out of the way. https://github.com/fairlearn/fairlearn/commit/b790db066236b2ab8e10fa101b8a7e8aed973967

I just published another PR to remove the ExponentiatedGradientResult https://github.com/fairlearn/fairlearn/pull/338. Removing the result object is something we've wanted to do for a while,. The result object is a remnant from Fairlearn<v0.3. It existed since ExponentiatedGradient was not a class, but simply a function. When we transitioned to actual classes we simply added the result object as a member with the goal to revisit this decision at a later point which we've now reached.

The overall goal here is to make the relevant result from fitting ExponentiatedGradient (EG) be easily extractable and pluggable into GridSearch (GS). The idea is to do the more sophisticated EG for a small number of iterations to get close to the optimal solution, and then explore the solutions around that intermediate solution using GS. For that purpose I checked the code in detail and found that the key piece required to make this transition from EG to GS happen is the lambda vector (lambda_vec). Right now that's not stored in the EGResult object, but that would be easy to add since we already compute it for EG itself.

For the EG-GS-hybrid mitigation technique we could either create a new class, or simply extend either one of EG or GS to have an option to do this. Example:

GridSearch(..., iterations_with_exponentiated_gradient=0)  # default
GridSearch(..., iterations_with_exponentiated_gradient=3)  # does 3 iterations of EG before switching to GS

There are many other parameters for EG that we'd need to be able to specify somehow, so the separate class seems like a better solution.

For this proposal I want to take this hybrid technique into account. However, the work on the hybrid itself will potentially be picked up by other contributors. In the following I'll focus on the remaining result object discussions.

Preferably, the same properties on EG and GS should be exposed the same way. This includes

Given these considerations I suggest eliminating result objects for GS as well. Instead the "results" from grid points can be stored directly on the object. This would look as follows:

The hybrid method can also fit into this scheme, although we will have to think about how exactly. Assuming we create a separate class for it, I imagine that it wraps both an EG and a GS object, and that it would expose their members either indirectly (automatically by wrapping them, e.g. self.exponentiated_gradient.lambda_vecs) or directly (self.lambda_vecs). In the direct version it would have to be made clear that lambda_vecs contains all lambda vectors (from EG and GS). That may lead to confusion, so perhaps the indirect/separate option is better. Again, this hybrid part is not an immediate decision to make since I'm only unifying/simplifying EG and GS right now.

Re @adrinjalali's comment: if someone actually wants to have a result object for an iteration of EG or a grid point of GS, we could add a method get_result(index) and return a Bunch with the appropriate contents. Not sure if people actually care about these, but if someone asks or if you think it's useful I'm more than happy to add it.

Please let me know what you think about this. If it's acceptable I can proceed in several small PRs.

MiroDudik commented 4 years ago

I like the proposed objects, with small caveats:

romanlutz commented 4 years ago

Update: this PR takes care of storing the lambda vectors. In the PR I stored the EG lambda vectors as self._lambda_vecs (subject to be renamed and exposed through a future change when everything is aligned properly), and the lambda vectors we get from the linear program (LP) as self._lambda_vecs_LP.

@MiroDudik this means I saved the wrong lambdas in the PR above? I checked the code that sets self.lagrangian.lambdas. It's this line in the best_h method:

self.lambdas[h_idx] = lambda_vec.copy()

and the lambda_vec here is the one that I'm storing. There's only one other call to the best_h method, and that's self.best_h(mul * lambda_hat) where mul is a multiplier between 1 and 10. Is that what you meant, i.e. that the second call may change the lambdas and then they don't exactly match, or was it something else?


You're absolutely right about predictors, it should indeed be equal to self.lagrangian.classifiers.


We will be keeping track of the oracle execution times even in the future. Maybe I expressed that badly, but what I meant is that EG already has this through oracle_calls_execution_time (type: list), and GS has this as members of the result objects. We will move these all into a new member of the GS object that's also named oracle_calls_execution_time for uniformity.

romanlutz commented 4 years ago

Everything has been addressed. For the hybrid method we have a separate issue since that affects far fewer parts of the code base. https://github.com/fairlearn/fairlearn/issues/345