facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.37k stars 308 forks source link

Question: Multi-objective BO with discrete search spaces, batch evaluations, off-line observations #537

Closed iandoxsee closed 3 years ago

iandoxsee commented 3 years ago

I'm looking to do multi-objective BO for physical (real-life) experiments with the following requirements:

1) Can handle 2-4 objectives, ability to minimize/maximize objectives by choice and constrain objectives so preferred regions of the Pareto front are preferentially explored.

2) Discrete search spaces composed of mixed discrete numerical and categorical values (e.g. [1.5, 3.0, 4.5], or [component_a, component_b, ...]. Total number of variable classes is 5-15, so fairly low-dimensional. (Some of my categorical variables can also be turned into numerical descriptors of the component, if this makes things easier to integrate with Ax/BoTorch)

3) An acquisition function which can suggest batches of 'n' suggested points (1≤n≤10) from the search space, which will correspond to physical experiments that will be performed. Experiments are real-life, physical experiments and so are very expensive.

4) Ability to run in sequential mode, where initial data is provided, the models are fit, new points suggested, real-life experiments are performed (hours to days of time required), and response data is fed back into model, etc.

I've seen conflicting information in the issues here about whether Ax and/or BoTorch can support these requirements, particularly on discrete search spaces.

Am I barking up the right tree by considering Ax/BoTorch for this application?

Thanks!

Balandat commented 3 years ago

Am I barking up the right tree by considering Ax/BoTorch for this application?

woof!

Pretty much all of these check out (see below). The main challenge, as you mentioned, is that the search space is discrete (or at least mixed). The ordered discrete parameters are usually reasonably easy to handle, but if there are a lot of categorical parameters then things can get difficult. However, we have some things in the works using kernels that can work well with categorical parameters, which should substantially improve our capabilities in that regard (cc @dme65).

  1. Our qEHVI algorithm will be able to handle this, including the objective constraints (we refer to these as objective thresholds). We also have a new variant that works well with noisy observations that we plan to push to the library in the near future (cc @sdaulton).
  2. If the categorical variables are ordinal, this will make things easier (see above). Can you tell us a few more details about your search space? Also, how many evaluations do you think you will be able to run?
  3. Yes, this (together with 4.) is basically our core competency.
iandoxsee commented 3 years ago

Excellent! Thank you for the thorough response.

  1. I’ve been looking at the qEHVI algorithm and I’m glad to hear it can handle this! The noisy variant would be particularly useful, since physical experiments will inherently bring in error.
  2. The categorical variables I’ll be using are primarily distinct chemicals, so either just a name (to be used with the ‘choice parameter’?) or I can decompose these chemicals to a few hundred numerical descriptors which are ordinal. That said, sometimes the categorical variable will be something not decomposable like: component_added_last: [comp_a, comp_b, etc]. In terms of number of evaluations, ideally 25-50 but for particularly intractable problems we can dedicate 100-150 evaluations.

If you’re willing, I’d like to follow up separately (email, LinkedIn?) to discuss in more detail. I can provide a summary update here to close out the issue later, if desired.

(Accidentally closed out issue by pushing the wrong button, reopened now)

Balandat commented 3 years ago

Interesting. How many categorical variables are there, how many values can each of them take on? Trying to gauge whether it's reasonable to do this with a categorical kernel or whether we need to do something smarter here.

Decomposing the chemicals into numerical descriptors would be interesting, but would result in a pretty high-dimensional domain on which regular kernels may have a hard time to with modeling. Is there some notion of the most relevant numerical descriptors? E.g. is there some natural dimensionality reduction that can be done in that space? More generally, what people have done in these kinds of setting is use a bunch of auxiliary data on the chemicals to train some kind of continuous embedding to map the categorical values into some lower-dimensional continuous space. If such an embedding were possible in this setting, this could be a viable way to go.

You are working with known chemicals, you're not trying to synthesize new ones, right? (this would be along the lines of e.g. https://pubs.rsc.org/en/content/articlelanding/2020/sc/c9sc04026a#!divAbstract).

iandoxsee commented 3 years ago

In nearly all cases, 5 or fewer chemical variables (~10 values for each), 5-10 numerical variables, a couple of pure categorical variables.

For chemical descriptors, it typically isn’t obvious which ones are most relevant, unfortunately. In cheminformatics, PCA is sometimes performed so this could be a viable approach, like you mentioned. Still, for simplicity’s sake it would be my preference to either one-hot encode the chemical variables or to directly use numerical descriptors.

Yes, the goal is to synthesize known chemicals and to find optimal trade-offs between distinct objectives in that synthesis. In essence, I am looking for a multi objective extension of this recent paper: https://www.nature.com/articles/s41586-021-03213-y

Balandat commented 3 years ago

To follow up on this, we have developed some models in botorch that will improve performance on mixed search spaces (see e.g. https://github.com/pytorch/botorch/pull/772). I am actively working on hooking them up in Ax, and they should be ready to use soon (planning to add a basic for this as well).