Batch Optimization - Githubissues

nrontsis commented 7 years ago

Hi,

I am really interested in the project and, since I am working on Batch Bayesian Optimisation, I would be keen on extending GPflowOpt for it.

Would you be interested in discussion and pull requests for this topic?

Thanks!

Nikitas

gpfins commented 7 years ago

We have been talking about batch optimization internally and think it would certainly be interesting to have (although not for the first version) and of course you are welcome to participate. Did you have any specific acquisition function in mind?

javdrher commented 7 years ago

Hi @nrontsis thank you for your interest in GPflowOpt!

Batch optimization is definitely on our list. In fact, I have some local code and some preliminary tests on it but it is not suited for a PR at all, furthermore I wasn't entirely sure if the things I did were all the right choices so I would be very interested to get a discussion going and work towards extending the framework. If you are willing to contribute thats great. Depending on the outcome of the discussion, the code could be used as a starting point if we decide that is the right way to go. If it is ok for you, I'll initiate the discussion but feel free to steer the discussion in the right direction.

A fundamental philosophy for GPflowOpt is that we want to provide a framework to allow easy implementation of BO strategies, rather than implementing a broad range of algorithms. When it comes to batch optimization it seems that there is a fundamental split between algorithms selecting a batch of N points sequentially (e.g. LP) or optimize the entire batch at once (e.g., q-KG, batch PES etc). Both approaches have quite different requirements in terms of implementation, so I have been in doubt what would be the best road for GPflowOpt. The latter seems more optimal but I don't know how feasible the optimization is as the optimization of the acquisition function can become high-dimensional. I would be very interested to hear your thoughts on this matter.

nrontsis commented 7 years ago

Thanks @nknudde @javdrher for the quick replies!

I believe that both approaches can be easily incorporated in the current framework.

For sequential optimisation (e.g. q-EI CL, BUBC)
- Usually the acquisition functions are exactly the same as the ones used serially. Hence there is no need to change the acquisitions.
- The only thing needed would be to introduce a new optimiser, i.e. SciPyOptimizer -> SciPySequentialOptimizer, which depending on the initial design would return multiple suggestions, generated by sequential optimisation. Adaptations would be needed in the Optimizer class: e.g. the whole acquisition object should be passed instead of a function handle to optimize.
For ''true'' batch optimisation (e.g. qEI, PPES or my oEI)
- Usually the optimisers are exactly the same as the ones used serially (i.e. L-BFGS-B). Hence the optimisers stay the same.
- The only thing needed would be to introduce a truly batch acquisition function (e.g. qEI).

Sequential optimisation is generally easier and that would be the vanilla choice for the practitioner. Moreover the ideas of, e.g. qEI-CL or LP can be applied to any acquisition function, allowing for a general solution. As a result we can start implementing sequential optimisation first and then move to the ''true'' batch case which has the challenges @javdrher mentioned (e.g. hard to optimise etc).

But either way, the first step would be to extend the framework to consider the acquisition function f(X) to be multipoint, i.e. accept multiple points at the same time. I think this is easy and would require minimal, clean changes.

What do you think? :)

javdrher commented 7 years ago

I agree that supporting both is possible and should fit within the design.

Your last remark for acquisition objects to be multi-point, I believe this is actually a requirement for the batch optimization rather than the sequential optimization? I actually think I already got that extension covered a while ago locally, are the changes in batch_support what you were thinking of? It replicates the domain for the optimizer, such that the candidate dimension becomes batch_size * dimension, then its already split so its easier to use it in Acquisition. If so, we could actually proceed by deciding on an acquisition to add to the framework itself and start with batch optimization itself.

For sequential optimization, I don't think this belongs in optimizer for a couple of reasons. Right now an optimizer is simply something which optimizes some function over a domain and nothing more. Making it accept only acquisition objects would change that completely and make them acquisition specific. This would make optimizers less applicable (i.e., in LP you could now easily use them to obtain the Lipschnitz estimate by optimizing the gradient norm over the domain). Additionally, what with the other optimizers (mc, candidates, bayesianoptimizer itself). Finally, together with some of the changes we are planning on the domain to implement discrete parameters I believe this will mix up several requirements in a single class.

The way I see it (though I might be missing something), in sequential mode the acquisition is optimized batch_size times over the same domain, but in a batch between each step something changes in the acquisition (penalizer, updating model with pseudo points etc.). So optimizer and its domain are the same, but acquisition is transformed. Furthermore, only acquisition knows exactly how to transform for a specific algorithm. We could add this logic partially in BayesianOptimizer by issuing a batch_update in an extra batch loop, but this morning I thought of something different:

with acquisition.batch(steps=batch_size) as batch:
   result = batch.optimize(self.optimizer)

the yielded batch object from the context can be generic and simply alternate between optimizing and calling a batch_update method on an acquisition function (which defaults to NotImplementedError and should be implemented in order to support sequential batch optimization). For generic approaches like LP, we have them wrap another acquisition and implement the batch_update in the LPAcquisition. The returned value of batch.optimize are the batch candidates for evaluation. Finally, when the context ends, the batch object should revert the acquisition to its original state so it can be updated the normal way with the obtained observations, ready for another round. This snippet would then go in BayesianOptimizer. Let me know what you think of this, I'd be happy to help out with its implementation if needed.

Last but not least, while browsing through the papers of the sequential batch algorithms there is often some form of more GP specific stuff going on to cope with the fact that no observations are available. This would imply proper checking if the model is actually a GPR, but it also means the fact that the scaling happens between acquisition and the model will pop up there. We have been talking about doing the scaling in Acquisition instead so code in build_acquisition is always operating on the same scale as the models, but we have been in doubt about it. This might be an extra argument to move the scaling. Don't worry about this for now.

nrontsis commented 7 years ago

@javdrher Thanks, I think I agree to all of your points. Sorry I didn't notice batch_support before. I will proceed in implementing the batch ''sequential'' case in a context manager fashion.

Will come back for more discussion.

nrontsis commented 7 years ago

I believe that this line should change to:

acq = self.build_acquisition(*tf.split(Xcand, num_or_size_splits=self.batch_size, axis=1))

But, more generally, why do we require the input X to be at least 2d? My understanding is that 1d would work even in the batch case, in the current implementation of batch_support.

javdrher commented 7 years ago

I pushed that branch this morning, it was a prototype on my laptop. You are right about the missing axis=1 parameter. I just found there are a few more required changes in my stash. I'll update the code.

The reason we currently take 2D as input is to allow evaluation of many candidates at the same time in parallel (e.g., evaluate 500 MC points on EI at once, then optimize the best candidate so far). This makes sense when the dimensionality grows to evaluate a large starting grid: because the acquisition is coded in tensorflow this computation is multi-core/can happen on GPU. Although we could have different semantics in the case of sequential batch support, I'd like to avoid to use the same construction in a different manner, for different use cases. An extra consideration here is that build_acquisition is a tensorflow method, depending on what kind of update is required on the model this can be achieved eaier outside of tf mode.

nrontsis commented 7 years ago

Hey, here are some thoughts after working a bit with the code:

I don't see a benefit of using a context manager for optimising a batch. It would require additional code, and I cannot see any use for __enter__ and __exit__. I think that using a function get_suggestion would be cleaner. In the base Acquisition class it would be something like:

def get_suggestion(self, optimizer):
        assert self.batch_size == 1
        return optimizer.optimize(self.inverse_acquisition)

while batch acquisition functions would overwrite this.

Moreover, I think it would be good to make the input tensor X rank 3. The first rank would be for the dimensionality of f, the second for the batch, and the third for parallel evaluations. There are a couple of reasons for this:

Currently, build_acquisition is called with exactly batch_size arguments. This is unsuitable for ''sequential'' batch optimisation, where the acquisition is actually of batch_size=1 and the batch is constructed on a greedy fashion in get_suggestion.
I think that extending the domain, especially on the same dimension, is confusing.

What do you think? Should I go on implementing it as I suggest? Thanks!

javdrher commented 7 years ago

I discussed this today with @icouckuy, and I'd like to make sure we are all on the same page. I'm sorry if this seems a bit slow but as the changes affect the framework quite a bit we want to proceed carefully and have requirements clear for everyone, so here is a rewind with some remarks towards your suggestions. Two general remarks:

We believe the true batch optimization and greedy step-wise approaches to batch optimization are very distinct when it comes to the implementation in GPflowOpt so we should handlie both cases seperately.
The batch_support branch was never intended to support greedy step-wise batch approaches and is all about the true batch optimization. Your concerns about the design in that branch with regards to the other scenario are completely valid, but be aware the code isn't intended to do that.

Taking a small step away from the code, let's decompose both problems and map it to the architecture following the same principles we applied to come up with the design of the other parts of the framework.

For the true batch optimization: From the point of view of the optimization problem finding the optimum of the acquisition (here: X1, X2.., Xb) the domain over which the optimizer operates is effectively extended. I disagree a better approach here is to add a third dimension to the candidates as this also affects the single query approaches we have now. Furthermore, the parallel evaluation is often referred to as batch evaluation and it gets confusing to have a batch evaluation and batch optimization dimension in the same variable. I think its equally effective to handle this third dimension by having it supported through multiple args, which allows it to be invisible for non-batch BO (and sequential batches, see below). Note that the batch_domain is passed to self.optimizer and not to the domain of BayesianOptimizer, this is an internal matter to the object and shouldn't never be confusing for a user. In summary: the changes of batch_support implement this case, but needs appropriate naming and some additional code (e.g. you might want to change batch_size at some point)
For the sequential case: as you correctly identified, we have a sequence of b optimizations over the original domain here. This means each step only one X is passed in (as if it were a single point acquisition strategy). Each of the b steps involves a method call on the acquisition object (which can be overwritten to implement it for a strategy) informing the object about previously selected batch points. This allows the acquisition to modify its computation (e.g. add a penalizer for a batch point). When all b points have been selected, a method is triggered which clears the modifications (e.g. remove all penalizers) for the next batch. In the terminology of batch_support branch, this means indeed the batch_size is 1 for each optimization but as stated, that branch is currently unaware of the sequential case hence the naming definately needs improvement to co-exist. I suggested a context here as it would offer to do the clearing step when the context is terminated but this is not a hard requirement for me and can be done differently. We think a good solution here is a method set_batch(points) which sets the modifications in place. Pseudocode for selecting a batch sequentially:
```
batch = np.empty((0, domain.size))
for i in number_of_batch_points:
acquisition.set_batch(batch)
result = self.optimizer.optimize(inverse_acquisition)
batch = np.vstack((batch, result.x))
acquisition.set_batch(np.empty((0, domain.size))) # clears all modifications for batch
```
The only difference with get_suggestion is that the optimizer runs outside of acquisition like is the case in GPflowOpt for the other use cases. As an exercise we considered the possibility to use GPflowOpt for asynchronous batch BO (not currently on our roadmap though) and our draft solution was quite simple by having the optimizer external. Note: the sequential batches can be implemented without any changes of the modifications in batch_support. In summary: adding a set_batch method which accepts a set of points. It is not implemented by default and is responsible for changing the object in a way such that the next call to evaluate accounts for these points as part of the batch.

I think this plan clearly reflects that both approaches are distinct: they can even be developed independently as there are no feature dependencies. Let us know what you think, know that we highly appreciate your input in the discussion as in only a couple of posts the contours of how this should look became much sharper. If this seems like a good plan, I can probably write the tests for the batch_support branch and solve its shortcomings and you could work in parallel on the sequential case. Then we should proceed and discuss which methods to provide as part of GPflowOpt itself.

nrontsis commented 7 years ago

Okay, that sounds like a solid plan, and I definitely understand you wanting to keep the framework clean.

So, if I understand you correctly:

Sequential case: implements a context manager (probably; after your last comment, I agree that it makes sense) for set_batch. Its build_acquisition would accept only one argument.
True batch case: Has a variable-argument build_acquisition.

Although it makes sense to see the batch and sequential cases as separate, I believe that we should think of how they will merge on the same branch [1]. This could be implemented as following. When batch_size > 1, the constructor of BayesianOptimizer checks the type of batch scenario. Something like:

if inspect.getfullargspec(self.acquisition.build_predict).vararg is not None:
    # Set flag/initialise for true batch optimization
elif hasattr(self.acquisition, 'set_batch'):
    # Set flag/initialise for sequential batch optimization
else:
    # Throw an exception

For example, this would allow us to set self.domain to domain or to batch_domain, and to determine if set_batch should be called in BayesianOptimizer._optimize.

If all that sounds good, I could work for the Sequential case on a new branch forked from master, with the following plan:

[ ] Adding a class variable batch_size in Acquisition (as you have in batch_support)
[ ] Create a new acquisition implementing set_batch
[ ] Use set_batch in BayesianOptimizer._optimize, when batch_size>1.

Should I proceed implementing the above and come back with a first PR for further discussion? As you suggested, you could work in batch_support independently.

[1]: The truth is that I'm biased on this: My work is on "true" batch optimisation, but I need to compare against the "vanilla" choice (i.e. "sequential" batch). So I prefer a framework that does both at the same time.

javdrher commented 7 years ago

Lets continue the design in the PR, thanks a lot. I agree on [1], right now we are in the process of releasing GPflowOpt 0.1 so we can not merge into master yet. It does make sense to merge our two branches together first and come up with a single batch branch which is then prepared for merging into master.

GPflow / GPflowOpt

Batch Optimization #71