Add surrogate models that can extrapolate

dlinzner-bcs commented 1 year ago

I encountered the problem multiple times that no in-spec points were found in the screening window and we needed to extrapolate using a linear model. I needed to do some workarounds from there on. I think adding such models (e.g. BayesianRidge) would be really helpful.

As I understand that feature is still missing? An other way to tackle this might be to have a "Bring Your Own Model" feature - which I am also not aware of..

jduerholt commented 1 year ago

Hi Dominik,

Linear models are currently supported (https://github.com/experimental-design/bofire/blob/main/bofire/data_models/surrogates/linear.py), which is just an GP with a linear kernel.

Note that I also encountered several times that the botorch priors tend to overfit (which was already noted by @DavidWalz or @bertiqwerty in MBO: https://github.com/basf/mbo/blob/d4061858c947af8deaa5ee8e4118615cb9328d02/mbo/torch_tools.py#L19). You can use these MBO priors also for the GP in bofire by just building them with the mbo priors: https://github.com/experimental-design/bofire/blob/main/bofire/data_models/priors/api.py

surrogate_data = SingleTaskGPSurrogate(
    inputs = domain.inputs, 
    outputs = outputs,
    kernel=ScaleKernel(base_kernel=RBFKernel(ard=True, lengthscale_prior=MBO_LENGTHCALE_PRIOR()),outputscale_prior=MBO_OUTPUTSCALE_PRIOR()),
    noise_prior=MBO_NOISE_PRIOR()
)

Often these priors generalize better. To automatically test this, you can use the hyperoptimize method on GP surrogate data:

from bofire.benchmarks.api import hyperoptimize

opt_surrogate_data, purity_metrics =  hyperoptimize(
    surrogate_data =  SingleTaskGPSurrogate(inputs=domain.inputs, outputs = outputs),
    training_data = experiments,
    folds = 5)

This will test certain combinations of priors and kernels and return the best found surrogate data and a data frame with the performance of the tested hyperparameters.

Also the bring your own model option is available. You can code up your own models in botorch and give them to bofire. Starting in cell 20, you find an example in this notebook: https://github.com/experimental-design/bofire/blob/main/tutorials/models_serial.ipynb.

If you have more questions/ideas etc. just let me know.

Best,

Johannes

dlinzner-bcs commented 1 year ago

Thank you Johannes! Can you please point me to an example on how to use a surrogate in combination with a strategy for optimization? I tried the following and get errors

from bofire.data_models.surrogates.api import LinearSurrogate, BotorchSurrogates
from bofire.data_models.strategies.api import (
    QparegoStrategy,
)

qparego_data_model = QparegoStrategy(
    domain=domain,
    surrogate_specs=BotorchSurrogates(
        surrogates=[
            LinearSurrogate(
                inputs=domain.inputs, outputs=Outputs(features=[domain.outputs[0]])
            ),
            LinearSurrogate(
                inputs=domain.inputs, outputs=Outputs(features=[domain.outputs[1]])
            ),
        ]
    ),
)

jduerholt commented 1 year ago

This is a bug, this PR should fix it: https://github.com/experimental-design/bofire/pull/290. Can you please review it?

As a workaround, you can also just use a SingleTaskGPSurrogate with a linear kernel, this is the same:

strategy_data = QnehviStrategyDataModel(
    domain=benchmark.domain,
    surrogate_specs=BotorchSurrogates(
        surrogates=[
            SingleTaskGPSurrogate(
                inputs=benchmark.domain.inputs,
                outputs=Outputs(features=[benchmark.domain.outputs[0]]),
                kernel=ScaleKernel(base_kernel=RBFKernel(ard=False))
            ),
            SingleTaskGPSurrogate(
                inputs=benchmark.domain.inputs,
                outputs=Outputs(features=[benchmark.domain.outputs[1]]),
                kernel=LinearKernel()
            )
        ]
    )
)

dlinzner-bcs commented 1 year ago

Thank you @jduerholt ! The linear model now works for me. Is it also possible to implement a QuadraticKernel() using our current setup? I want to use a quadratric surrogate i.e. assume y = W[a, b, ab, a^2, b^2].T . I thought initially to use a linear model with respective constraints - but these would be nonlinear. Many thanks again!

dlinzner-bcs commented 1 year ago

This looks like what I want. Do you think it makes sense to implement it? (I can probably do it)

jduerholt commented 1 year ago

It makes definitely sense to implement it. It is on my list for a long time already and should be quite easy to do so. Just have a look how for example the linear kernel is implemented and do it in the same way: https://github.com/experimental-design/bofire/blob/main/bofire/kernels/mapper.py

experimental-design / bofire

Add surrogate models that can extrapolate #286