facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.34k stars 303 forks source link

Implement a custom Surrogate model for use in generation strategies #748

Closed sgbaird closed 2 years ago

sgbaird commented 2 years ago

TL;DR

It seems like a model needs to be capable of sampling from a posterior distribution, rather than simply returning a scalar value for uncertainty, in order to be used as a Surrogate model within a GenerationStep in Ax (or BoTorch for that matter). This would preclude the use of data science models such as sklearn's GradientBoostingRegressor despite its ability to return prediction intervals. Makes sense.. this is Bayesian Optimization after all.

Ax and BoTorch Docs show how to implement custom Surrogate and BoTorch Model

One of the most relevant places for this that I've seen in the documentation is 5. Utilizing BoTorchModel in generation strategies. In the example, the second GenerationStep takes a model_kwarg dict that contains surrogate: Surrogate(SingleTaskGP). A condensed version is as follows:

from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from botorch.acquisition import UpperConfidenceBound
from ax.modelbridge.modelbridge_utils import get_pending_observation_features

gs = GenerationStrategy(
    steps=[
        GenerationStep(  # Initialization step
            model=Models.SOBOL,
            num_trials=5,
            min_trials_observed=5, 
        ),
        GenerationStep(  # BayesOpt step
            model=Models.BOTORCH_MODULAR,
            num_trials=-1,
            model_kwargs={  # Kwargs to pass to `BoTorchModel.__init__`
                "surrogate": Surrogate(SingleTaskGP),
                "botorch_acqf_class": qNoisyExpectedImprovement,
            },
        )
    ]
)

To use it with existing data (e.g. data from a lab notebook or an Excel spreadsheet) per the workflow from https://github.com/facebook/Ax/issues/743#issuecomment-987778240, the GenerationStep of the generation strategy (gs) (again, from https://github.com/facebook/Ax/issues/743#issuecomment-987778240) could be modified to become:

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.BOTORCH_MODULAR,
            num_trials=-1, 
            max_parallelism=3,
            model_kwargs={  # Kwargs to pass to `BoTorchModel.__init__`
                "surrogate": Surrogate(SingleTaskGP),
                "botorch_acqf_class": qNoisyExpectedImprovement,
            },
        ),
    ]
)

A Surrogate can be made from a BoTorch Model, and an example of implementing a custom BoTorch Model is given in the BoTorch docs. This example links to https://botorch.org/docs/models which in turn states:

BoTorch models are PyTorch modules that implement the light-weight Model interface. A BoTorch Model requires only a single posterior() method that takes in a Tensor X of design points, and returns a Posterior object describing the (joint) probability distribution of the model output(s) over the design points in X.

ML/Data Scientist might use GradientBoostingRegressor with uncertainty

To put this into the context of someone with an ML/data science background, but not necessarily a background in BO, I'll take the "custom surrogate model" to be sklearn's Gradient Boosting Regression:

from sklearn.ensemble import GradientBoostingRegressor

which natively allows for generation of prediction intervals. Since this is a prediction interval rather than a standard error mean (SEM), for the sake of simplicity I'll rework the upper and lower bounds into a single scalar and treat this as a measure of uncertainty:

sigma = 0.5*(u-y) + 0.5*(y-l)

where u, l, and f represent upper bound, lower bound, and prediction, respectively.

No Posterior for GradientBoostingRegressor, can't use in generation strategy

So how would I convert this model into a Surrogate model? After some digging, the answer seems to be that I can't, as long as GradientBoostingRegressor is only returning scalar estimates of uncertainty at prediction points rather than sampling from a posterior distribution or providing access to a covariance function. Perhaps there are ways around this (variational inference?). Concerning my actual model based on attention-networks (i.e. the transformer architecture that was popularized by NLP, and was recently adapted into materials science via CrabNet), there seems to be some interest in Bayesian attention, but it's a rabbit hole I probably can't justify right now.

@Balandat, does my conclusion seem about right? Or is there something I'm missing?

eytan commented 2 years ago

If you ultimately want to use a PyTorch model / transformers I wonder if it's best to start with that. I am not super familiar what what the state of the art is for Bayesian transformers, but this package looks like a simple wrapper class that lets you do Bayes by Backprop w/ transformers, and it looks pretty generic. I wonder if you can just use it w/ CrabNet. https://github.com/yliess86/BayeFormers

lena-kashtelyan commented 2 years ago

@sgbaird, reached out to you on LinkedIn to set up a discussion!

sgbaird commented 2 years ago

@eytan, that's a great point, and thank you for sharing BayesFormers! I had found a few codebases, but nothing that looks as promising as this one. I agree that with some work, I may be able to incorporate BayesFormer into CrabNet, at which point I think it would be compatible with the BoTorch Model API.

@lena-kashtelyan that sounds great! I'll get back to you soon via email.

lena-kashtelyan commented 2 years ago

@sgbaird, sounds like we can close this one for now, as custom surrogate likely will not be needed as we discussed in meeting. Let's feel free to reopen this if it comes back on the radar!

Runyu-Zhang commented 3 months ago

TL;DR

It seems like a model needs to be capable of sampling from a posterior distribution, rather than simply returning a scalar value for uncertainty, in order to be used as a Surrogate model within a GenerationStep in Ax (or BoTorch for that matter). This would preclude the use of data science models such as sklearn's GradientBoostingRegressor despite its ability to return prediction intervals. Makes sense.. this is Bayesian Optimization after all.

Ax and BoTorch Docs show how to implement custom Surrogate and BoTorch Model

One of the most relevant places for this that I've seen in the documentation is 5. Utilizing BoTorchModel in generation strategies. In the example, the second GenerationStep takes a model_kwarg dict that contains surrogate: Surrogate(SingleTaskGP). A condensed version is as follows:

from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from botorch.acquisition import UpperConfidenceBound
from ax.modelbridge.modelbridge_utils import get_pending_observation_features

gs = GenerationStrategy(
    steps=[
        GenerationStep(  # Initialization step
            model=Models.SOBOL,
            num_trials=5,
            min_trials_observed=5, 
        ),
        GenerationStep(  # BayesOpt step
            model=Models.BOTORCH_MODULAR,
            num_trials=-1,
            model_kwargs={  # Kwargs to pass to `BoTorchModel.__init__`
                "surrogate": Surrogate(SingleTaskGP),
                "botorch_acqf_class": qNoisyExpectedImprovement,
            },
        )
    ]
)

To use it with existing data (e.g. data from a lab notebook or an Excel spreadsheet) per the workflow from #743 (comment), the GenerationStep of the generation strategy (gs) (again, from #743 (comment)) could be modified to become:

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.BOTORCH_MODULAR,
            num_trials=-1, 
            max_parallelism=3,
            model_kwargs={  # Kwargs to pass to `BoTorchModel.__init__`
                "surrogate": Surrogate(SingleTaskGP),
                "botorch_acqf_class": qNoisyExpectedImprovement,
            },
        ),
    ]
)

A Surrogate can be made from a BoTorch Model, and an example of implementing a custom BoTorch Model is given in the BoTorch docs. This example links to https://botorch.org/docs/models which in turn states:

BoTorch models are PyTorch modules that implement the light-weight Model interface. A BoTorch Model requires only a single posterior() method that takes in a Tensor X of design points, and returns a Posterior object describing the (joint) probability distribution of the model output(s) over the design points in X.

ML/Data Scientist might use GradientBoostingRegressor with uncertainty

To put this into the context of someone with an ML/data science background, but not necessarily a background in BO, I'll take the "custom surrogate model" to be sklearn's Gradient Boosting Regression:

from sklearn.ensemble import GradientBoostingRegressor

which natively allows for generation of prediction intervals. Since this is a prediction interval rather than a standard error mean (SEM), for the sake of simplicity I'll rework the upper and lower bounds into a single scalar and treat this as a measure of uncertainty:

sigma = 0.5*(u-y) + 0.5*(y-l)

where u, l, and f represent upper bound, lower bound, and prediction, respectively.

No Posterior for GradientBoostingRegressor, can't use in generation strategy

So how would I convert this model into a Surrogate model? After some digging, the answer seems to be that I can't, as long as GradientBoostingRegressor is only returning scalar estimates of uncertainty at prediction points rather than sampling from a posterior distribution or providing access to a covariance function. Perhaps there are ways around this (variational inference?). Concerning my actual model based on attention-networks (i.e. the transformer architecture that was popularized by NLP, and was recently adapted into materials science via CrabNet), there seems to be some interest in Bayesian attention, but it's a rabbit hole I probably can't justify right now.

@Balandat, does my conclusion seem about right? Or is there something I'm missing?

Hi @sgbaird Thank you for your helpful discussions on implementing BOTORCH_MODULAR in Ax Client. I am wondering if you have ever tried to set up a ModelListGP for multi-objective BO in the same way? I know Ax/tutorials/modular_botax.ipynb specifically mentioned "except for BoTorch ModelListGP"

sgbaird commented 3 months ago

@Runyu-Zhang unfortunately, no, but I'm curious to hear if you do!