experimental-design / bofire

Experimental design and (multi-objective) bayesian optimization.
https://experimental-design.github.io/bofire/
BSD 3-Clause "New" or "Revised" License
230 stars 23 forks source link

DOE: NChooseK constraints #145

Closed jduerholt closed 1 year ago

jduerholt commented 1 year ago

@DavidWalz @Osburg

As iptopt is also supporting non-linear constraints, I thought about if it could be worth to implement a continuous relaxation for the NChooseK constraints. This would also make it possible to fully support NChooseK constraints in DoE. I am currently adding this continuous relaxation also to the BayesOpt methods in Bofire.

You can find a description here: https://github.com/pytorch/botorch/issues/1749#issuecomment-1474763398

Just have a look at the linked tutorial notebook there. I tested it in BayesOpt and it works ;) What do you think?

Osburg commented 1 year ago

While working on the old mopti-based doe I did some experiments with NChooseK constraints. In mopti, the violation of NChooseK constraints is defined as the sum of the absolute values of the three smallest deviations from 0 if the number of of nonzero components is larger than k and 0 otherwise. I think this should be a continuous function, but not smooth. IPOPT had problems dealing with this constraint function (in fact, the current solution for NChooseK constraints in doe was chosen, because IPOPT was not able to find acceptable designs using this constraint function). I think it is a good idea to give it another try with a nicer constraint function like the one you suggested.

I could have a look at this / you can assign the issue to me if you want to (except you want to take care of this @jduerholt , @DavidWalz)

jduerholt commented 1 year ago

Would be happy if you could have a look at it, I have to finalize it next week for the BayesOpt part. I can at least say that the scipy SLSQP is able to handle it this version, as this optimizer is used in botorch for acqf optimization.

I have already implemented these constraints here:

https://github.com/experimental-design/bofire/blob/3c9fb824123c941b9ab8a0190b99cee98d6a44be/bofire/utils/torch_tools.py#L88

I takes the domain and returns a list of torch callables which evaluate the constraints. You can in principle use these torch based callables directly as is in DoE and compute the Jacobian via torch autograd. Then you do not need to implement the gradients etc. by hand. Test cases can be found in this module: https://github.com/experimental-design/bofire/blob/main/tests/bofire/utils/test_torch_tools.py

You can also have a look in botorch, how they convert the non-linear constraints to scipy including automatic generation of the Jacobian: https://github.com/pytorch/botorch/blob/f596e71cc3d6404f1af3dfda1a92983d7303ef07/botorch/generation/gen.py#L222

From my understanding the python ipopt interface is relatively similar to the scipy.optimize interface.

Osburg commented 1 year ago

I implemented the constraint / constraint jacobian inside the NChooseKConstraint class using the narrow Gaussian approximation as we discussed in this PR. Unfortunately, IPOPT typically ends up with an unsuccessful optimization (except for some really small toy problems) and exits with this error message

EXIT: Converged to a point of local infeasibility. Problem may be infeasible.

At first I suspected my new lines of code to have mistakes in it, but the problem even remains for the toy problem from the sample notebook from this discussion, where SLSQP does the job. In case you want to reproduce this finding, this is code you can use for it

import numpy as np
import torch
from botorch.test_functions import Hartmann
from cyipopt import minimize_ipopt
from scipy.optimize import minimize

hartmann = Hartmann(dim=6)

def f(x):
    return hartmann(torch.from_numpy(x)).numpy()

def narrow_gaussian(x, ell):
    return torch.exp(-0.5 * (x / ell) ** 2)

def ineq_constraint(x, ell=1e-3):
    # Approximation of || x ||_0 <= 3. The constraint is >= 0 to conform with SLSQP
    return narrow_gaussian(x, ell).sum(dim=-1) - 3

x0 = 0.5 * np.ones(6)
bounds = [(0,1) for i in range(6)]
nlc = {"type": "ineq", "fun": lambda x: ineq_constraint(torch.from_numpy(x)).numpy()}

res = minimize_ipopt(f, x0, constraints=nlc, bounds=bounds, options={"disp":5})
res = minimize(f, x0, constraints=nlc, bounds=bounds, options={"disp":5})

I assume that the reason for these problems could be the quickly vanishing gradients of the narrow Gaussians when moving away from its center. Once you are not close to the center anymore it becomes practically zero very fast and should therefore become uninformative. What do you think about this? Does it make sense to you? However, this does not explain why SLSQP does not have the same problems with this kind of constraint... In mopti the NChooseK constraint was evaluated in a different way where the gradient does not drop with the distance to the coordinate axes (see here). For the old doe I did some tests with this way of evaluating NChooseK constraints and could achieve better results than with the narrow Gaussian relaxation. Anyways, it did not do the job for doe because IPOPT could not manage to fulfill the constraints in more complicated cases (many decision variables or additional linear / nonlinear constraints). This is why we came up with the current solution of imposing a different constraint that (by "manually" fixing which variables should be zero for each experiment) that is actually even more restrictive, but also easier to handle for IPOPT (effectively, some variables to be optimized are just taken out of the game...).

So my conclusion would be right now that we should stick to the current solution in doe since IPOPT can produce acceptable results with it but can't with the narrow Gaussians. But I could anyways open a PR to add the new methods for NChooseKConstraints (that could be useful elsewhere, just not for optimizing with IPOPT) + some other stuff I did recently (e.g. objective function works with autodiff for the log-determinant and the new (?) feature in formulaic for symbolic differentiation of model matrices). Would you accept such a PR?

I wish you happy easter holidays!

jduerholt commented 1 year ago

Totally overlooked this over easter. Sorry! One heretical question: why does one needs ipopt for DoE, should SLSQP not also do the job?

Osburg commented 1 year ago

Unfortunately it is necessary to use IPOPT, because for larger scale problems (e.g. 20 inputs, fully-quadratic model) the "normal" scipy.optimize.minimize gets reeeeallly slow. Since the interface of minimize_ipopt and scipy's minimize is the same, you can easily convince yourself by replacing minimize_ipopt with the scipy method.

jduerholt commented 1 year ago

Thanks for the info, then it is like it is and we stick with your original implementation for the nchooseks and keep it just as option. Thanks for your effort, it was just an idea as it was working with slsqp ...

Another option could be to have in find_local_max_ipopt another argument (use_slsqp: bool = False) to specify which optimizer to use. The default would be ipopt, but one could switch to slsqp and then use there the other way of dealing with the nchooseks. This could be helpful for smaller problems (6 inputs etc.). What do you think?

Osburg commented 1 year ago

We could add this as an option instead of of nchoosek_as_bounds (since nchoosek not as bounds seems to be useless for ipopt). However, I just did some tests with IPOPT + nchoosek as bounds and SLSQP + nchoosek with narrow gaussians.

I used 6 inputs with bounds (0,1), a fully-quadratic model, n_experiment=10 and the following constraints

  1. [NChooseKConstraint(features=[f"x{i+1}" for i in range(6)], min_count=0, max_count=3, none_also_valid=True)]
  2. [NChooseKConstraint(features=[f"x{i+1}" for i in range(6)], min_count=0, max_count=3, none_also_valid=True), LinearEqualityConstraint(features=[f"x{i+1}" for i in range(6)], coefficients=[1,1,1,1,1,1], rhs=1)]
  3. [NChooseKConstraint(features=[f"x{i+1}" for i in range(5)], min_count=0, max_count=1, none_also_valid=True), LinearEqualityConstraint(features=[f"x{i+1}" for i in range(6)], coefficients=[1,1,1,1,1,1], rhs=1)]

In the first two cases the NChooseK constraint is only semi-relevant since the optimal design without the nchoosek constraint fulfills the nchoosek constraints. The last case has a "real" nchoosek constraint. I have run each of these cases three times with these average objective values:

  1. SLSQP: 275.1, IPOPT: 275.1
  2. SLSQP: 292.8, IPOPT: 292.7
  3. SLSQP: 342.1, IPOPT: 310.6, SLSQP+nchoosek_as_bounds: 328.5

So to me this indicates that the current solution seems to be a pretty good surrogate for the actual nchoosek constraint (at least in this simple example), but is easier to handle (for both IPOPT and SLSQP). On the other hand, the current solution can only handle nchoosek constraints with min_count=0 (this case was the only one existing in mopti). So if you see an advantage in having the option of SLSQP I will add it. Personally, I have the feeling that it will be a very rare event that SLSQP + narrow gaussians will perform better than IPOPT + nchoosek as bounds.

jduerholt commented 1 year ago

Okay, then we remove it. It was worth a try ;)

dlinzner-bcs commented 1 year ago

This is solved.

sgbaird commented 1 year ago

Personally, I have the feeling that it will be a very rare event that SLSQP + narrow gaussians will perform better than IPOPT + nchoosek as bounds.

Thank you for the thorough evaluation of these two methods. This is the kind of testing that I was hoping to get to. @Osburg and @jduerholt I appreciate you both exploring this.

On the other hand, the current solution can only handle nchoosek constraints with min_count=0 (this case was the only one existing in mopti)

In a formulation setting, I take it min_count=0 means the algorithm is allowed to make the nonsensical choice of "add no ingredients", but that if this case happens, it can be rejected manually by the user during the optimization.

sgbaird commented 1 year ago

@Osburg, I also have a follow-up question. Could you give a brief description of what the "original implementation" or "current solution" is?

From https://github.com/experimental-design/bofire/issues/145#issuecomment-1480263796

IPOPT had problems dealing with this constraint function (in fact, the current solution for NChooseK constraints in doe was chosen, because IPOPT was not able to find acceptable designs using this constraint function).

From https://github.com/experimental-design/bofire/issues/145#issuecomment-1504997635

Thanks for the info, then it is like it is and we stick with your original implementation for the nchooseks and keep it just as option. Thanks for your effort, it was just an idea as it was working with slsqp

jduerholt commented 1 year ago

Let me already comment from high level, @Osburg can then further add on it ;)

It is important to know that we are talking in this issue just about the DoE submodule of Bofire, which can be used to generate for example D-optimal plans under constraints and different model functions. Despite D optimality also other optimality measures like K-optimality are possible (https://github.com/experimental-design/bofire/blob/main/bofire/strategies/enum.py#L4)

In this contect we have tested the "original implementation" against the narrow gaussian method. Note that we use in the BO strategies based on botorch always the narrow gaussian approach.

So what is the "orginal implementation"? @Osburg please correct if I am wrong ;)

The original implementation which is also the current one is described here: https://basf.github.io/doe/nchoosek_constraint/

image

Just as additional note: the DoE part of BoFire was originally developed at BASF and later merged into BoFire where it is further developed. Unfortunately, we have not yet migrated the documentation ;)

Osburg commented 1 year ago

@jduerholt yes, this is how it still works :) Originally, I was asked to add support for NChooseK constraints in combination with a mixture constraint. Because of the symmetry of this constraint all variables are interchangeable. The assumption of the "current solution" is that we have such a symmetry. If this assumption is met we have a handwavy argument why the "current solution" works: because of the symmetry of the problem all variables should be 0 equally often in a D-optimal design. So we can take all possible combinations of k-n out of n variables, pick one combination out of this list for each experiment and fix the corresponding variables to 0. This defines a constraint that is actually more restrictive than the original NChooseK constraint, but in this symmetric situation a D-optimal design should fulfill both the NChooseK constraint and also the more restrictive surrogate we just constructed. This is also we have observed when comparing the "current solution"+IPOPT vs. SLSQP + narrow Gaussians. However, as described in the documentation (https://basf.github.io/doe/nchoosek_constraint/) it is easy to find examples (by breaking the symmetry assumptions) where the stronger constraint removes substantial parts of the feasible set (i.e. a D-optimal design using the stronger constraints is much worse / not a minimum when using the original constraints). But so far I did not get any complaints that the "current solution" creates too restrictive constraints, probably because the people using the doe subpackage only use it for problems that are close enough to a symmetric situation as described above (?) :D

simonsung06 commented 1 year ago

@jduerholt @Osburg

Firstly, thanks for your work in implementing and verifying the NChooseK solution for DoE in BoFire.

Sorry for reopening the issue. I have use cases where having min_count to be greater than 0 is needed. As noted above, I could generate samples and then filter out the ones that do not meet what I want but I would prefer to have the method be able to account for min_count > 0.

Is there a limitation in being able to "easily" implement this or would it require re-evaluating how NChooseK works in its current form, e.g. having to re-visit the possibility for having the narrow Gaussian approach be a alternative method in these cases?

Cheers

dlinzner-bcs commented 1 year ago

@simonsung06 There is an implementation of what you need, see for example here https://github.com/experimental-design/bofire/blob/b04904d281872be521cf6a4144c52f5143359c88/tests/bofire/strategies/test_doe.py#L204 The only problem is that this can take quite long, as the full categorical problem is build and solved via relaxation, - or if that fails to provide a feasible solution - via branch and bound. How does your use case look like?

Osburg commented 1 year ago

@simonsung06 just as an addition to what @dlinzner-bcs already wrote: In case of min_count = 1 it could be an idea to just add a linear constraint (in case of nonnegative decision variables)

$x_1 + x_2 + ... + x_n >= c$

where $c > 0$. In a more general case where negative values for some of the decision variables are allowed you could replace it by an L1-Norm constraint. This should do the job without increasing the computational effort a lot. I tested it on this small example

from bofire.strategies.doe.design import find_local_max_ipopt
from bofire.data_models.domain.api import Domain
from bofire.data_models.constraints.api import NChooseKConstraint, LinearInequalityConstraint
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput
import numpy as np

domain = Domain(
    inputs = [ContinuousInput(key=f"x{i+1}", bounds=(0,1)) for i in range(3)],
    outputs = [ContinuousOutput(key="y")],
    constraints = [
        NChooseKConstraint(features=["x1","x2","x3"], min_count=0, max_count=2, none_also_valid=True),
    ]
)

res = find_local_max_ipopt(
    domain=domain,
    model_type="linear",
    n_experiments=7,
    ipopt_options={"maxiter":500},
)
np.round(res,2)

The result without the additional linear constraint is

x1 x2 x3
exp0 -0.0 -0.0 0.0
exp1 1.0 0.0 1.0
exp2 0.0 1.0 1.0
exp3 -0.0 -0.0 0.0
exp4 1.0 0.0 1.0
exp5 0.0 -0.0 -0.0
exp6 1.0 1.0 0.0

By adding the linear constraint we can avoid the experiment (0,0,0). So in this very simple case it seems to work.

domain = Domain(
    inputs = [ContinuousInput(key=f"x{i+1}", bounds=(0,1)) for i in range(3)],
    outputs = [ContinuousOutput(key="y")],
    constraints = [
        LinearInequalityConstraint(features=[f"x{i+1}" for i in range(3)], coefficients=[-1,-1,-1], rhs=-1),
        NChooseKConstraint(features=["x1","x2","x3"], min_count=0, max_count=2, none_also_valid=True),
    ]
)

res = find_local_max_ipopt(
    domain=domain,
    model_type="linear",
    n_experiments=7,
    ipopt_options={"maxiter":500},
)
np.round(res,2)
x1 x2 x3
exp0 1.0 -0.0 0.0
exp1 0.0 1.0 -0.0
exp2 -0.0 0.0 1.0
exp3 1.0 -0.0 0.0
exp4 0.0 1.0 1.0
exp5 1.0 0.0 1.0
exp6 1.0 1.0 0.0
simonsung06 commented 1 year ago

Thank you @dlinzner-bcs and @Osburg for the advice on this. It is nice to see that min_count can be above 0.

For my situation, it was for a mixture style DoE with 5 components that needed a LinearEqualityConstraint and a NChooseKConstraint for 2 of the components with min_count=1, max_count=2, and none_also_valid=False. Furthermore, I also provided past experiments using the set_candidates function. The key change in my set up that made it give a valid a design that passes the NChooseKConstraint was to have the lower bounds of the NChooseKConstraint features be greater than 0. I like that the unselected variables are still 0 whereas selected variables still use the specified bounds!

Another query that I have: I generated the design using the OptimalityCriterionEnum.D_OPTIMALITY objective but I would have liked to compare the result with OptimalityCriterionEnum.SPACE_FILLING but the latter did not converge to a solution when I tried it, and was repeatedly producing the message: current length of branching queue (+ new branches): {pre_size} + {len(next_branches)} currently explored branches: {num_explored}, current best value: {current_branch.value}. Is there a way to specify a maximum number of iterations; or can you suggest something else?

Osburg commented 1 year ago

Using the method I proposed for min_count=1 I do not run into your problem for a problem of the same structure as yours (so maybe it is worth for you to try it out on your problem, too).

I created the following domain

domain = Domain(
    inputs = [ContinuousInput(key=f"x{i+1}", bounds=(0,1)) for i in range(5)],
    outputs = [ContinuousOutput(key="y")],
    constraints = [
        LinearEqualityConstraint(features=[f"x{i+1}" for i in range(5)], coefficients=[1,1,1,1,1], rhs=1),
        NChooseKConstraint(features=["x1","x2"], min_count=0, max_count=2, none_also_valid=False),
        LinearInequalityConstraint(features=["x1","x2"], coefficients=[-1,-1], rhs=-0.1),
    ]
)

This should roughly correspond to your problem. To ensure min_count=1 I added the additional linear equality constraint (where I assumed that we want at least a share of 10% for the active component, but it is open to you to choose a value you consider large enough). I assumed a fully-quadratic model and added two fixed experiments to imitate your problem as good as possible.

res = find_local_max_ipopt(
    domain=domain,
    model_type="fully-quadratic",
    fixed_experiments=pd.DataFrame(
        np.array([[0.5,0.5,0,0,0], [0.5,0,0.5,0,0]]), columns=domain.inputs.get_keys()),
    objective=OptimalityCriterionEnum.D_OPTIMALITY,
    ipopt_options={"maxiter":300, "disp":5, "timing_statistics":"yes"},
)
np.round(res,2)

The resulting design looks like this

x1 x2 x3 x4 x5
exp0 0.50 0.50 0.00 0.00 0.00
exp1 0.50 0.00 0.50 0.00 0.00
exp2 0.10 -0.00 -0.00 0.44 0.46
exp3 -0.00 0.10 0.44 0.46 -0.00
exp4 1.00 -0.00 0.00 -0.00 -0.00
exp5 0.10 -0.00 -0.00 -0.00 0.90
exp6 0.10 -0.00 0.45 -0.00 0.45
exp7 0.50 0.50 -0.00 -0.00 -0.00
exp8 -0.00 0.10 0.90 -0.00 -0.00
exp9 0.53 -0.00 -0.00 -0.00 0.47
exp10 0.10 -0.00 -0.00 0.90 -0.00
exp11 -0.00 1.00 -0.00 0.00 -0.00
exp12 -0.00 0.10 0.44 -0.00 0.46
exp13 0.55 -0.00 -0.00 0.45 -0.00
exp14 -0.00 0.55 0.45 -0.00 -0.00
exp15 -0.00 0.53 -0.00 -0.00 0.47
exp16 -0.00 0.50 -0.00 0.50 -0.00
exp17 0.55 -0.00 -0.00 0.45 -0.00
exp18 -0.00 0.55 0.45 -0.00 -0.00
exp19 -0.00 0.50 -0.00 0.50 -0.00
exp20 0.10 -0.00 0.46 0.44 -0.00
exp21 -0.00 0.10 -0.00 0.45 0.45
exp22 0.51 -0.00 0.49 -0.00 -0.00
exp23 -0.00 0.10 -0.00 -0.00 0.90

Using the space filling objective instead of d optimality IPOPT still converges and produces the following design

x1 x2 x3 x4 x5
exp0 0.50 0.50 0.00 0.00 0.00
exp1 0.50 0.00 0.50 0.00 0.00
exp2 0.40 0.01 0.01 0.57 0.01
exp3 0.32 0.33 0.03 0.30 0.01
exp4 0.11 0.02 0.61 0.01 0.25
exp5 0.31 0.32 0.35 0.01 0.01
exp6 0.01 0.31 0.01 0.01 0.66
exp7 0.08 0.04 0.01 0.86 0.01
exp8 0.91 0.02 0.02 0.04 0.01
exp9 0.62 0.01 0.02 0.02 0.33
exp10 0.10 0.01 0.32 0.56 0.01
exp11 0.01 0.25 0.01 0.54 0.19
exp12 0.01 0.65 0.32 0.01 0.01
exp13 0.08 0.03 0.63 0.25 0.01
exp14 0.00 0.34 0.33 0.01 0.32
exp15 0.31 0.31 0.01 0.02 0.35
exp16 0.00 0.34 0.33 0.32 0.01
exp17 0.07 0.05 0.02 0.33 0.52
exp18 0.10 0.01 0.31 0.01 0.57
exp19 0.01 0.64 0.01 0.01 0.32
exp20 0.32 0.01 0.00 0.01 0.66
exp21 0.01 0.64 0.01 0.32 0.01
exp22 0.27 0.03 0.25 0.22 0.23
exp23 0.02 0.95 0.01 0.01 0.01

Does this approach go in the right direction or did I misunderstand you somewhere? Hope this is somewhat helpful...

Cheers, Aaron :)

P.S.: One question: You impose an NChooseKConstraint on 2 variables with max_count=2. If it is a valid option for you to use a linear inequality for ensuring that one component is nonzero, you could also just drop the NChooseKConstraint (since it will never be active), right? In fact, due to a bug it is not possible to use an NChooseKConstraint w/ max_count = #(variables in the constraint). I just opened a PR (#304) to fix this. If you do not want to wait for it to be merged it should be valid to just omit the NChooseKConstraint.

dlinzner-bcs commented 1 year ago

@simonsung06

Another query that I have: I generated the design using the OptimalityCriterionEnum.D_OPTIMALITY objective but I would have liked to compare the result with OptimalityCriterionEnum.SPACE_FILLING but the latter did not converge to a solution when I tried it, and was repeatedly producing the message: current length of branching queue (+ new branches): {pre_size} + {len(next_branches)} currently explored branches: {num_explored}, current best value: {current_branch.value}. Is there a way to specify a maximum number of iterations; or can you suggest something else?

Currently, there is no maximum number sadly. There are two things that you can try (aside from Aarons solution above):

  1. set the strategy to "relaxed" and turn the validator off via

    candidates = strategy.ask(
        candidate_count=n_experiments, raise_validation_error=False
    )

    , which will consider the problem as a relaxed one. One risk is that you might not end up with a valid design at the end.

  2. set the strategy to "iterative". Then only one experiment at a time will be optimized given all previous experiments (and preset candidates). This shrinks the combinatorical space of the BnB algorithm significantly and converge to a solution much more quickly.
simonsung06 commented 1 year ago

Hi @dlinzner-bcs @Osburg

Thanks for the suggestions. They all proved to be useful for my problem and also to help me understand more in general. I liked how it looks without the NChooseKConstraint and just using the LinearInequalityConstraint instead. Still worked and generated designs quickly too. The iterative strategy also worked but I preferred the designed using the previously mentioned methods, but it is great to know this option exists :)