Implementation of preferential Bayesian optimization.

EmuKit / emukit

A Python-based toolbox of various methods in decision making, uncertainty quantification and statistical emulation: multi-fidelity, experimental design, Bayesian optimisation, Bayesian quadrature, etc.

https://emukit.github.io/emukit/

Apache License 2.0

592 stars 128 forks source link

Implementation of preferential Bayesian optimization. #149

Open javiergonzalezh opened 5 years ago

javiergonzalezh commented 5 years ago

As detailed in this paper: https://arxiv.org/abs/1704.03651

danielwinkler commented 4 years ago

Any updates on this feature?

apaleyes commented 4 years ago

@danielwinkler , thanks to @esiivola we have an example of preferential bayes opt implemented here: https://github.com/amzn/emukit/tree/master/emukit/examples/preferential_batch_bayesian_optimization . It's not an original that Javier linked to above, but can serve as, well, example of how it can be implemented with Emukit

danielwinkler commented 4 years ago

@apaleyes wow that is amazing, thanks a lot for your fast response!

esiivola commented 4 years ago

To add, the acquisition functions used in the example are not exactly as in https://arxiv.org/abs/1704.03651 However, the original paper has limitations in the scalability w.r.t. the number of attributes in x. The implemented approach solves that issue and also makes it possible to compare more than two items.

danielwinkler commented 4 years ago

@esiivola thanks a lot for the implementation and the additional information

A question that is also related to this topic: All papers i seem to read address the limitation of PBO to high dimensional spaces On the other hand, the IBO of Brochu already did preferential BO for high dimensional parameter settings with very nice results

I wonder if this is due to the possible unimodal nature of the parameter space of his example applications?

Do you have any explanation / intuition in this respect?

esiivola commented 4 years ago

The IBO approach of Brochu only presents one acquisition strategy which is very exploitative. Some consider this to be a limitation as well (Which Gonzalez et al. solve in their paper).

The discussion in the papers you have read might be related to the unimodal nature of their application domain. I have to admit, I'm not an expert on animation design so I cannot really give a good answer on this. Gonzalez et al. however compare their approach to IBO and clearly perform better. This might be so because they might be able to capture the uncertainty of the latent space better (by extending it). You can see this by comparing the results of IBO and CEI, which both are exploitative acquisition strategies. However, this comes with the cost of not being very scalable.

esiivola commented 4 years ago

So if you only need to compare two items (and not a batch of items) AND your input is not very high dimensional, the approach of Gonzalez et al. might be the best option for you (but the source code is not openly available). If not, then it might be a good idea to try our approach! If you are only comparing two items and are using q-EI, then the implemented example becomes IBO.

esiivola commented 4 years ago

This is the only open source implementation of the paper by Gonzalez. et al. that I know of: https://github.com/fcole90/minimal_pbo

However, they do not implement any acquisition function, but it might be easy to extend their code.

danielwinkler commented 4 years ago

@esiivola Thanks a lot for your support

I only need to compare two items BUT it is high dimensional

From reading papers, this should not work; however, my hope is that the problem nature is similar to the application domain of Brochu. In a way you can think of it of tuning a single or two channels of a sound mixer (volume, pan, low freq, mid freq, high freq, ...) for a given sound and environment to the preference of a user / expert

each of the knobs (parameters) should be unimodal
there may also be correlations between the parameters (e.g. the freqs) (that should be applicable to the covariance of the GP)

I will have a look at minimal pbo that you linked, and definitely also at your implementation

I really appreciate all the effort you put into helping me!

esiivola commented 4 years ago

Knowing your application, if you can also ask the user for optimal knob position given that other parameters are fixed, you might also be interested in this upcoming ICML 2020 paper: https://arxiv.org/pdf/2002.03113.pdf

danielwinkler commented 4 years ago

Thanks,

I skimmed through the paper already this morning, I have to read it in detail but so far it is not clear to me how to express the problem as projective preferential query.

What you are suggesting is that the expert user would be able to the tell the optimal value for a knob, given the the acquisition strategy prescribes the values of the remaining dimensions?

esiivola commented 4 years ago

Hi!

Yes, exactly! If so, the whole optimization process would be much faster than with a single preferential query. They actually are inputting the "optimal knob position" as multiple preferential feedbacks, allowing the model to learn much faster.

danielwinkler commented 4 years ago

great, I will look deeper into that direction as the results of PPBO look really impressive!

All the best, Daniel

DominickZhang commented 4 years ago

Hi,

I am also struggling on the implementation of the acquisition function. Following the work of Hernandez2014, the Gaussian process prior for f can be approximated by the inner product of a vector of random features and \theta, where the prior of \theta is a standard multivariate normal distribution. In the experiments of PBO, the length of the vector seems to be unknown. More importantly, \theta is randomly drawn in a posterior, where based on my derivations, P(\theta| Dt) is proportional to P(\theta)*P(Dt|\theta), which is also proportional to P(\theta)P(y|X,\theta). The final form of the posterior is C*P(\theta)*sigmoid(y1*\phi(x1)^T*\theta)*...*sigmoid(yn*\phi(xn)^T*\theta), where C is a normalization factor. Sampling from such a posterior of \theta seems to be computationally hard especially when\theta is high-dimensional (e.g. 1000 in Hernandez2014). I am not sure where my thoughts went wrong. I would really appreciate if there is any help.

Sorry for the bad editing. I really hope Github could add the feature of editing mathematical formulas for efficient discussions. :)

Best, Jinnian