Reductions - Unification and Simplification

riedgar-ms commented 4 years ago

Currently ExponentiatedGradient and GridSearch implement their own 'result' types. However, there is a lot of commonality, which should be separated into an OracleCall object, which stores a trained model, and the associated lambda_vec. The goal would be to be able to take an OracleCall from the ensemble and expand a grid around it for GridSearch.

Also, it does not make sense to have an ExponentiatedGradientResult property on the ExponentiatedGradient object. There is only one result for that algorithm, so the necessary bits and pieces should be directly on the ExponentiatedGradient class.

MiroDudik commented 4 years ago

In particular, the fitted properties should be stored in the estimator object and follow the naming convention of https://scikit-learn.org/stable/developers/develop.html

adrinjalali commented 4 years ago

Also note that whenever the result is a complicated object, we usually return a dict or a Bunch object in sklearn.

romanlutz commented 4 years ago

expgrad result frames everything in terms of final result:

best classifier
best gap
classifiers (from each iteration)
weights
last t
best t
number of oracle calls
oracle calls execution time (list containing the times for all calls)

expgrad keeps the result object around through self._expgrad_result

grid search frames everything in terms of results per "grid point" since it's not sequential:

predictor
lambda vec
objective
gamma
oracle call execution time (for the individual call)

All the grid search result objects go into self._all_results which is a list.

Design considerations:

interoperability: if both use the same way to store results then we can take the results of a expgrad run over an iteration and plug it into a grid search run.
scaling grid search out might mean we don't have all the individual "results" locally

As Miro pointed out a couple of days ago result objects as such may not be suitable. Maybe it's better to just have everything as a property on the resulting predictor? In this case we'd have many instances of the various properties, such as lambda vec, so we'd probably need a list or dictionary. For interoperability it would probably easier to have everything grouped together and "plug" it into the other method that way, as opposed to having to find and use each individual property. The same thought somewhat more concrete:

expgrad = ExponentiatedGradient(...)
expgrad.fit(....)
grid_search = GridSearch(expgrad.result, ...)
grid_search.fit(....)

vs.

expgrad = ExponentiatedGradient(...)
expgrad.fit(....)
grid_search = GridSearch(expgrad.property1, expgrad.property2, expgrad.property3, expgrad.property4, ...)
grid_search.fit(....)

The proposal:

represent the results always as a sequence of results the way it already is done in grid search. That way we can index it by iteration (for expgrad) or by grid search "point"
```
expgrad.result[5]  # result object from 6th iteration
grid_search.result[5]  # result object from 6th point in the grid
```
Result objects themselves are shared between both methods. We can call it OracleCallResult, or just OracleCall, or OracleCallSummary or whatever makes sense. I feel like OracleCall alone suggests that there's something to be done with the object rather than just interpreting the contents, but I don't feel very strongly either way.
The result object itself contains
- oracle call execution time (for the individual oracle call)
- lambda vec
- gamma
- predictor
- weights
and more can be added by extending this basic set with a more specific subclass, but for interoperability they should be sufficient. The more specific details could include grid coordinates, for example.

and both methods need to have the following properties:

n_oracle_calls
best_result
result which contains a list of the result objects (Q: what about multi-dimensional grids?)

Please shout if there are things I'm forgetting, or if there's anything objectionable. If you have feedback (or just better ideas!) just post them below ;-)

@adrinjalali @MiroDudik @riedgar-ms @vingu

adrinjalali commented 4 years ago

Looking at the ExponentiatedGradientResult, I really think Bunch is what you can use instead. That's what we use in several places, and satisfies pretty much all your needs for the results.

I also just realized Reduction doesn't really have anything. Why is it its own class? It seems like instead of inheriting from Reduction, those classes could subclass BaseEstimator and ClassifierMixin or RegressionMixin, unless I'm missing something.

romanlutz commented 4 years ago

Finally getting to this! Will do it step by step and see what makes sense for the "result objects". There's an upcoming PR for using the mixins.

romanlutz commented 4 years ago

The mixins PR is completed, so that's out of the way. https://github.com/fairlearn/fairlearn/commit/b790db066236b2ab8e10fa101b8a7e8aed973967

I just published another PR to remove the ExponentiatedGradientResult https://github.com/fairlearn/fairlearn/pull/338. Removing the result object is something we've wanted to do for a while,. The result object is a remnant from Fairlearn<v0.3. It existed since ExponentiatedGradient was not a class, but simply a function. When we transitioned to actual classes we simply added the result object as a member with the goal to revisit this decision at a later point which we've now reached.

The overall goal here is to make the relevant result from fitting ExponentiatedGradient (EG) be easily extractable and pluggable into GridSearch (GS). The idea is to do the more sophisticated EG for a small number of iterations to get close to the optimal solution, and then explore the solutions around that intermediate solution using GS. For that purpose I checked the code in detail and found that the key piece required to make this transition from EG to GS happen is the lambda vector (lambda_vec). Right now that's not stored in the EGResult object, but that would be easy to add since we already compute it for EG itself.

For the EG-GS-hybrid mitigation technique we could either create a new class, or simply extend either one of EG or GS to have an option to do this. Example:

GridSearch(..., iterations_with_exponentiated_gradient=0)  # default
GridSearch(..., iterations_with_exponentiated_gradient=3)  # does 3 iterations of EG before switching to GS

There are many other parameters for EG that we'd need to be able to specify somehow, so the separate class seems like a better solution.

For this proposal I want to take this hybrid technique into account. However, the work on the hybrid itself will potentially be picked up by other contributors. In the following I'll focus on the remaining result object discussions.

Preferably, the same properties on EG and GS should be exposed the same way. This includes

lambda_vec per grid point in GS, and per iteration in EG
oracle_call_execution_time (GS per grid point) and oracle_calls_execution_time (EG, overall), respectively
best_result (GS, a result object) vs. _best_classifier (EG, a predictor)
the predictors of each grid point in GS and the predictors in each iteration of EG (hs)

Given these considerations I suggest eliminating result objects for GS as well. Instead the "results" from grid points can be stored directly on the object. This would look as follows:

lambda_vecs: a collection of lambda vectors
- GS: indexed by grid point. Each grid point has an index 0...n already
- EG: indexed by iteration 0...t
oracle_calls_execution_time: a collection of execution times per oracle call
- EG: already exists!
- GS: this can replace the oracle_call_execution_time that's currently inside the GSResult
best_predictor: predictor object to replace best_result (GS) and _best_classifier (EG); will have a predict method
- EG: slight adjustments needed to the current way of handling things (see below). _best_classifier right now is a method, not an object. self._best_classifier = lambda X: _mean_pred(X, hs, self._weights)
- GS: This exists within best_result as predictor, so we can simply expose it differently. We should also save the best_grid_index so that we can access everything related to the best grid point (such as lambda vectors, oracle call execution time, and such.
predictors: a collection of predictors
- GS: one per grid point; currently stored in the result object, but will be moved into this collection
- EG: one per iteration where each predictor contributes to the final result best_predictor (contrary to GS); this will replace hs.

The hybrid method can also fit into this scheme, although we will have to think about how exactly. Assuming we create a separate class for it, I imagine that it wraps both an EG and a GS object, and that it would expose their members either indirectly (automatically by wrapping them, e.g. self.exponentiated_gradient.lambda_vecs) or directly (self.lambda_vecs). In the direct version it would have to be made clear that lambda_vecs contains all lambda vectors (from EG and GS). That may lead to confusion, so perhaps the indirect/separate option is better. Again, this hybrid part is not an immediate decision to make since I'm only unifying/simplifying EG and GS right now.

Re @adrinjalali's comment: if someone actually wants to have a result object for an iteration of EG or a grid point of GS, we could add a method get_result(index) and return a Bunch with the appropriate contents. Not sure if people actually care about these, but if someone asks or if you think it's useful I'm more than happy to add it.

Please let me know what you think about this. If it's acceptable I can proceed in several small PRs.

MiroDudik commented 4 years ago

I like the proposed objects, with small caveats:

lambda_vecs: In EG, I think it is more natural to consider returning self.lagrangian.lambdas (they don't exactly match the lambdas produced by EG iterations, but these are the lambdas that gave rise to unique predictors, similarly to how these arise in GridSearch)
predictors: In EG, this should be set equal to self.lagrangian.classifiers
how about oracle_execution_times?

romanlutz commented 4 years ago

Update: this PR takes care of storing the lambda vectors. In the PR I stored the EG lambda vectors as self._lambda_vecs (subject to be renamed and exposed through a future change when everything is aligned properly), and the lambda vectors we get from the linear program (LP) as self._lambda_vecs_LP.

@MiroDudik this means I saved the wrong lambdas in the PR above? I checked the code that sets self.lagrangian.lambdas. It's this line in the best_h method:

self.lambdas[h_idx] = lambda_vec.copy()

and the lambda_vec here is the one that I'm storing. There's only one other call to the best_h method, and that's self.best_h(mul * lambda_hat) where mul is a multiplier between 1 and 10. Is that what you meant, i.e. that the second call may change the lambdas and then they don't exactly match, or was it something else?

You're absolutely right about predictors, it should indeed be equal to self.lagrangian.classifiers.

We will be keeping track of the oracle execution times even in the future. Maybe I expressed that badly, but what I meant is that EG already has this through oracle_calls_execution_time (type: list), and GS has this as members of the result objects. We will move these all into a new member of the GS object that's also named oracle_calls_execution_time for uniformity.

romanlutz commented 4 years ago

Everything has been addressed. For the hybrid method we have a separate issue since that affects far fewer parts of the code base. https://github.com/fairlearn/fairlearn/issues/345

fairlearn / fairlearn-proposals

Reductions - Unification and Simplification #4