experimental-design / bofire

Experimental design and (multi-objective) bayesian optimization.
https://experimental-design.github.io/bofire/
BSD 3-Clause "New" or "Revised" License
207 stars 22 forks source link

Batch constraint for DoE #310

Closed KappatC closed 8 months ago

KappatC commented 10 months ago

From partner interactions we learnt that some parameters can only be varried batch-wise i.e., formulation parameters of the material and process parameters e.g., temperature. Currently we only have a heuristic way to implement it that only finds suboptimal designs. Thus, we are interesting in adding a new type of constraint that enforces a batch structure in the design. The way to achieve this is by adding equality constraints across rows.

bertiqwerty commented 10 months ago

Agreed. Do you plan to prepare a pull-request for this?

KappatC commented 10 months ago

Yes. As soon as we implement it, we make a pull request as well.

jduerholt commented 10 months ago

Very good point, I have thought about this already for a long time, as consequence I discussed this for the BO Part with the botorch guys (https://github.com/pytorch/botorch/issues/1737) and implemented a missing featue for this into botorch https://github.com/pytorch/botorch/pull/1757.

The way it is handled by the SLSQP in botorch should be directly applicable also to the ipopt optimier used in DoE, namely by imposing a set of additional equality constraints along the batch dimension. Note that also the polytope sampler from botorch which is also used in bofire is able of handling this when sampling from a constrained polytope, so we can easily make this available for our RandomStrategy and also for the Botorch based ones.

The open question for me is only how to make this available API wise. I would opt for implementing a new type of constraints like this:

class BatchConstraint(Constraint):
    type: str

class BatchEqualityConstraint(BatchCostraint):
    type: Literal["BatchConstraint"] = "BatchConstraint"
    feature: str

This would mean that in one batch generated by a strategies the feature provided in the BatchEqualityConstraint has to have always the same value. Note that we can only validate this type of constraint at runtime of the strategy and not already in the data model of the strategy because if the requested number of experiments per strategy is just one, the constraint can be fulfilled by every strategy.

@bertiqwerty : what do you think?

dlinzner-bcs commented 10 months ago

@jduerholt That is great news! We also want to have this type of constraint for the DoE part. Here, however, multiple batches are intended to be optimized jointly. So we need some field to communicate the number of equality constraints that need to be generated. I would opt for a field multiplicity: Optional[int] that communicates this. Also - in our scenario multiple variables are batched at once. So we need to provides features: List[str] here. Does this work for you?

jduerholt commented 10 months ago

@dlinzner-bcs Hmm, I do not understand what you mean with "multiple batches are intended to be optimized jointly". In my view, you would ask for example for 10 candidates and you want that within this batch of 10 candidates feature with key "a" has always the same equal value. Or do I overlook something?

Concerning multiple variables/features: then provide several of the constraints to the domain, for every feature that should have the same value a new constraint, from my point of view, this is much cleaner. What do you think?

dlinzner-bcs commented 10 months ago

@jduerholt what I mean is the following. I have some parameters that are easy to vary, while others are hard. For example I make 3 batches of cookie dough with different recipes. From each of those batches I create 3 cookies that I bake at 5 different temperatures ;). I want to create an DoE for this. I learned yesterday that this constraint has a name "split plot design". It comes from agriculture, where you create experiments for different plots of land...

Then, we can not work with feature specific constraints, as multiple features at once are kept at the same value for some rows. So there is variation between batches, but less as in an unconstrained design.

Osburg commented 10 months ago

@jduerholt I started implementing such a new constraint class some days ago. If you and the rest agree on realizing the batch constraints as a new constraint class feel free to assign the issue to me if you like :) @dlinzner-bcs the cookie example you described seems to me as if we could decompose this one in two batch constraints of the type that @jduerholt described. One for the cookie dough recipes and one for baking at a certain temperature where the batch size of the first batch constraint is 5 times the batch size of the second one, right? So I think @jduerholt's suggestion should support the type of problem you described.

dlinzner-bcs commented 10 months ago

ok. sorry if I did not make myself clear enough. What I want to get out is something like this

image

Does this work with what you have in mind?

Osburg commented 10 months ago

@dlinzner-bcs yes I think so. This would even be possible with only one batch constraint, bc I had in mind that the user can specify which variables should be affected by the batch constraint (does this make sense? @jduerholt did you also intend to do it in this way?). So in your example i would define one batch constraint affecting "sugar", "milk" and "chocolate", but not "temperature".

jduerholt commented 10 months ago

I am not 100% sure, does your example consists of three batches (green, yellow, red), or is it one big batch?

dlinzner-bcs commented 10 months ago

Three batches.

jduerholt commented 10 months ago

And do you want to get these batches with one call to ask or with three seperate ones? Three seperate ones, or?

dlinzner-bcs commented 10 months ago

With one .ask() call. I view it as a single constrained DoE. So the batches should still try to maximize the D-criterion for example.

dlinzner-bcs commented 10 months ago

I want to design a DoE for cookie dough - so I need to find the best way of varying the recipe still. The world of bakery depends on this!

jduerholt commented 10 months ago

With one .ask() call. I view it as a single constrained DoE. So the batches should still try to maximize the D-criterion for example.

Then, I think you are correct, we have to define the constraint as you said using features and multiplicity. @Osburg what do you think?

Osburg commented 10 months ago

I assumed that in a scenario where we have a batch constraint it should apply to each set of batch_size subsequent experiments (is this actually the case?). In this situation all equality constraints are already defined by the batch size, the features and the total number of experiments (by partitioning all experiments into sets of batch_size experiments and imposing equality constraints on these batches separately). I think this is equivalent to setting the multiplicity to the maximum reasonable value by default. Would it happen in reality that we want to optimize a design, where e.g. the first 20 experiments are done in a batches and the remaining experiments not? If this is the case I would agree that we need sth like multiplicity.

jduerholt commented 10 months ago

Hmm, I have a bit the feeling, that it could be much easier to discuss this in a call? Should I setup one for next week to discuss it?

@Osburg @dlinzner-bcs ?