Ryan-Rhys / Heteroscedastic-BO

Heteroscedastic Bayesian Optimisation in Numpy
21 stars 2 forks source link

Where to look for composition-based optimization and how to implement with custom dataset #28

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago

Hi Ryan, you mentioned in https://github.com/facebook/Ax/issues/751#issuecomment-990472001 that this method includes materials composition as input parameters (solar cell use-case). Two questions:

Nice work btw. Thanks for sharing!

Ryan-Rhys commented 2 years ago

Hi Sterling,

Our physics lab experiments fell through on the experimental side unfortunately and so we didn't end up running the heteroscedastic BO algorithm over material composition space!

I'm currently working on trying to port the MLHGP surrogate and acquisition functions to BoTorch. I believe Max Balandat has an operational version of the MLHGP surrogate in BoTorch that he's planning on releasing in the next couple of weeks. I'm hoping the implementation will be a lot more user-friendly in BoTorch!

In principle it would be possible to modify one of the toy experiment scripts:

https://github.com/Ryan-Rhys/Heteroscedastic-BO/blob/master/BayesOpt/bayesopt_experiments/toy_sin_noise.py

something like this

import numpy as np

from acquisition_funcs.acquisition_functions import heteroscedastic_expected_improvement, heteroscedastic_propose_location

bounds = np.array([0, 10]).reshape(-1, 1)  # bounds of the Bayesian Optimisation problem.

#  Initial noisy data points sampled uniformly at random from the input space.

init_num_samples = 25

# X_init and Y_init can be replaced with the initial points of the custom dataset

X_init = np.random.uniform(0, 10, init_num_samples).reshape(-1, 1)  # sample 10 points at random from the bounds to initialise with
Y_init = linear_sin_noise(X_init, noise_coeff, plot_sample, coefficient, fplot=fplot).reshape(-1, 1)
plot_sample = np.linspace(0, 10, 50).reshape(-1, 1)  # samples for plotting purposes

# initial GP hyper settings following advice from Iain Murray's slides: https://homepages.inf.ed.ac.uk/imurray2/teaching/08gp_slides.pdf

l_init = 1.0
sigma_f_init = 1.0
noise = 1.0
l_noise_init = 1.0
sigma_f_noise_init = 1.0
gp2_noise = 1.0
num_iters = 10
sample_size = 100
aleatoric_weight = 1

# New suggested query location

het_X_next = heteroscedastic_propose_location(heteroscedastic_expected_improvement, X_init,
                                              Y_init, noise, l_init, sigma_f_init, l_noise_init,
                                              sigma_f_noise_init, gp2_noise, num_iters, sample_size, bounds,
                                              plot_sample, n_restarts=3, min_val=300, aleatoric_weight=aleatoric_weight)

to load in a custom dataset and get a single query point although the scripts are far from the cleanest from a usability standpoint and don't support features such as parallel queries. As such, it may be worth waiting for the BoTorch version of the MLHGP or looking into the chained GP approach which is implemented in GPflow:

https://gpflow.readthedocs.io/en/develop/notebooks/advanced/heteroskedastic.html

although I guess the latter would have to be interfaced with a GPflow-friendly BO framework! I'd like to look into how to implement this model in BoTorch as well!

Best, Ryan

sgbaird commented 2 years ago

@Ryan-Rhys thanks for this! And that's too bad the experimental implementation fell through (I take it that's what you were referring to with the high-throughput reactor in Singapore). In the modified toy_sin_noise.py example that you showed, if I adapt this with my own data, is there anything about the method that is explicitly recognizing it as a composition-based problem?

Ryan-Rhys commented 2 years ago

@sgbaird

I guess the key for the composition-based problem is setting the bounds to be a simplex i.e. modifying the following line:

bounds = np.array([0, 10]).reshape(-1, 1) # bounds of the Bayesian Optimisation problem.

The general term for the design space bounds required for material component inputs seems to be a "mixture design"

https://reliawiki.org/index.php/Mixture_Design