GPflow / GPflowOpt

Bayesian Optimization using GPflow
Apache License 2.0
269 stars 60 forks source link

Constrained optimization #10

Open icouckuy opened 7 years ago

icouckuy commented 7 years ago

This issue keeps track of the support for constraints in GPFlowOpt.

Support for expensive constraints will be initially added using the Probability of Feasbility (PoF). With the acquisition function defined as gamma(x) = EI * PoF.

Paying attention to:

Other constraints, which can be cheaply evaluated, are passed through to the optimizer. Although I'm not sure about the support (and performance) of constraints in scipy. I believe the TF optimizers do not have direct support for constraints, of course adding a penalty to the objective function (or loss) is always possible.

@javdrher we make a distinction between equality and inequality constraints? Equality constraints might not work well with the PoF, if at all. At first sight, expected violation is an option, but I'll have to look in literature what the standard approaches are again. It is not needed for version 0.1.0.

javdrher commented 7 years ago

The first point is an issue indeed. That is currently wrong.

I disagree we should automatically detect no feasible points and switch to PoF only. I'd leave that out of the library. Its possible to first use a BayesianOptimizer with PoF, then run one with gamma. It would be interesting though to support some sort of stopping criteria. It would be helpful in this scenario.

I specifically only included scipy optimization instead of TF optimizers. I don't see a specific reason to work penalty functions. Although I am open to be convinced otherwise. For white-box equality constraints can be avoided by transforming the domain first. PoF works quite bad with equality constraints so better options are welcome there.

yangvz commented 7 years ago

It might be a dumb question, but how is tensorflow's gradient tensors passed to scipy for optimization purposes?

gpfins commented 7 years ago

https://github.com/GPflow/GPflow/blob/master/GPflow/model.py The graph to calculate the function and gradients is calculated in the _compile function and an objective function in generated.which returns the function and gradients.

In _optimize_np the objective function is used in result = minimize(..) with jac=True the jacobian is assumed to come with the objective function.

yangvz commented 7 years ago

@nknudde Thank you, I will take a more careful look into GPflow.

yangvz commented 7 years ago

Ah, I see. It's because evaluate() is decorated with AutoFlow, so it's automatically compiled.

javdrher commented 7 years ago

Not sure if you were referring to the SciPy optimization of the model likelihood (as described by @nknudde ) or the optimization of the acquisition function which applies a similar principle. The actual construction of the gradient of the acquisition function occurs in _build_acquisition_wrapper (called by evaluate and evaluate_with_gradients) . However, I added a mechanism in the actual optimizers to discard the gradients in case of gradient-free optimization (such as the MCOptimizer).

Thank you for pointing this out, its something I should mention in the documentation.

icouckuy commented 7 years ago

I disagree we should automatically detect no feasible points and switch to PoF only. I'd leave that out of the library. Its possible to first use a BayesianOptimizer with PoF, then run one with gamma. It would be interesting though to support some sort of stopping criteria. It would be helpful in this scenario.

Makes sense. I think it should be well-documented that in BayesianOptimizer the Acquisition function (and underlying GPFlow model) is updated with new data, so the user knows he can use his own pointer to the acquisition function in a next sampling stage (possible with a new hybrid acquisition function).

I specifically only included scipy optimization instead of TF optimizers. I don't see a specific reason to work penalty functions.

Agree, lets just rely on the scipy optimizers.

mccajm commented 7 years ago

Would you mind providing a small example of inequality constraints please as they stand currently?

My domain is invariant to position, e.g. [1, 2, 3] and [2, 3, 1] are the same solution. Is there a better way to model this than a > b and b > c in BO?

yangvz commented 7 years ago

The problem is that unlike GP hyperparameters, real world input points are highly unregulatory in their settings. Examples are:

I think it would be better to separate the package into a lower level and a higher level, so that it's possible for advanced users to bypass Domain (in the higher level) and specify constraints and optimization methods explicitly (in the lower level).

I've created a gist to illustrate my point: https://gist.github.com/yangnw/241ecf33f16a5212b3a433e631ab2117 This is certainly a premature design that uses mixins but it illustrates the point.

In particular, the high-level functionalities (etc. can be used even if the low-level functionalities are extended (etc. KB). This is somehow similar to TensorFlow's design.

javdrher commented 7 years ago

Hi @mccajm , thank you for your interest in GPFlowOpt. With regards to constraints, there are two distinct use cases:

Regarding your position invariant domain, your solution would definitely be valid. Although no simple way exists yet, you can probably already achieve this by implementing your own optimizer (i.e., based on the SciPyOptimizer) which you then use to optimize the acquisition function in BayesianOptimizer. Different solution: I guess your domain is symmetric so you probably can also come up with a transform to a domain of lower dimensionality which covers all your function values (i.e. figure 1 in https://arxiv.org/pdf/1310.6740.pdf which is related to your problem, but in a black-box setting). You can then optimize on this reduced domain.

Some thoughts on the implementation of white-box constraints in GPflowOpt:

icouckuy commented 6 years ago

For expensive constraints we can take a look at Gelbart's dissertation section 2.6: https://dash.harvard.edu/bitstream/handle/1/17467236/GELBART-DISSERTATION-2015.pdf?sequence=1

It summarizes a number of proposed constrained acquisitions functions over the years. At the moment we only have implemented the probability of feasibility.

javdrher commented 6 years ago

Recall that we do not intend to support a wide range of methods, I'd only include others if there is a specific reason for it.

icouckuy commented 6 years ago

@yangnw The current plan is to store the (cheap) constraints in the Domain object. This can include coefficient matrices (for whitebox linear constraints) or function handles (for cheap blackbox constraints).

The Domain object is passed to an Optimizer and so the constraints are available there. I think this is still low-level enough. Of course for really specific constraints (e.g., symmetry) a custom Optimizer class must be implemented.