Closed jduerholt closed 1 year ago
While working on the old mopti-based doe I did some experiments with NChooseK constraints. In mopti, the violation of NChooseK constraints is defined as the sum of the absolute values of the three smallest deviations from 0 if the number of of nonzero components is larger than k and 0 otherwise. I think this should be a continuous function, but not smooth. IPOPT had problems dealing with this constraint function (in fact, the current solution for NChooseK constraints in doe was chosen, because IPOPT was not able to find acceptable designs using this constraint function). I think it is a good idea to give it another try with a nicer constraint function like the one you suggested.
I could have a look at this / you can assign the issue to me if you want to (except you want to take care of this @jduerholt , @DavidWalz)
Would be happy if you could have a look at it, I have to finalize it next week for the BayesOpt part. I can at least say that the scipy SLSQP is able to handle it this version, as this optimizer is used in botorch for acqf optimization.
I have already implemented these constraints here:
I takes the domain and returns a list of torch callables which evaluate the constraints. You can in principle use these torch based callables directly as is in DoE and compute the Jacobian via torch autograd. Then you do not need to implement the gradients etc. by hand. Test cases can be found in this module: https://github.com/experimental-design/bofire/blob/main/tests/bofire/utils/test_torch_tools.py
You can also have a look in botorch, how they convert the non-linear constraints to scipy including automatic generation of the Jacobian: https://github.com/pytorch/botorch/blob/f596e71cc3d6404f1af3dfda1a92983d7303ef07/botorch/generation/gen.py#L222
From my understanding the python ipopt interface is relatively similar to the scipy.optimize interface.
I implemented the constraint / constraint jacobian inside the NChooseKConstraint
class using the narrow Gaussian approximation as we discussed in this PR. Unfortunately, IPOPT typically ends up with an unsuccessful optimization (except for some really small toy problems) and exits with this error message
EXIT: Converged to a point of local infeasibility. Problem may be infeasible.
At first I suspected my new lines of code to have mistakes in it, but the problem even remains for the toy problem from the sample notebook from this discussion, where SLSQP does the job. In case you want to reproduce this finding, this is code you can use for it
import numpy as np
import torch
from botorch.test_functions import Hartmann
from cyipopt import minimize_ipopt
from scipy.optimize import minimize
hartmann = Hartmann(dim=6)
def f(x):
return hartmann(torch.from_numpy(x)).numpy()
def narrow_gaussian(x, ell):
return torch.exp(-0.5 * (x / ell) ** 2)
def ineq_constraint(x, ell=1e-3):
# Approximation of || x ||_0 <= 3. The constraint is >= 0 to conform with SLSQP
return narrow_gaussian(x, ell).sum(dim=-1) - 3
x0 = 0.5 * np.ones(6)
bounds = [(0,1) for i in range(6)]
nlc = {"type": "ineq", "fun": lambda x: ineq_constraint(torch.from_numpy(x)).numpy()}
res = minimize_ipopt(f, x0, constraints=nlc, bounds=bounds, options={"disp":5})
res = minimize(f, x0, constraints=nlc, bounds=bounds, options={"disp":5})
I assume that the reason for these problems could be the quickly vanishing gradients of the narrow Gaussians when moving away from its center. Once you are not close to the center anymore it becomes practically zero very fast and should therefore become uninformative. What do you think about this? Does it make sense to you? However, this does not explain why SLSQP does not have the same problems with this kind of constraint... In mopti the NChooseK constraint was evaluated in a different way where the gradient does not drop with the distance to the coordinate axes (see here). For the old doe I did some tests with this way of evaluating NChooseK constraints and could achieve better results than with the narrow Gaussian relaxation. Anyways, it did not do the job for doe because IPOPT could not manage to fulfill the constraints in more complicated cases (many decision variables or additional linear / nonlinear constraints). This is why we came up with the current solution of imposing a different constraint that (by "manually" fixing which variables should be zero for each experiment) that is actually even more restrictive, but also easier to handle for IPOPT (effectively, some variables to be optimized are just taken out of the game...).
So my conclusion would be right now that we should stick to the current solution in doe since IPOPT can produce acceptable results with it but can't with the narrow Gaussians. But I could anyways open a PR to add the new methods for NChooseKConstraints (that could be useful elsewhere, just not for optimizing with IPOPT) + some other stuff I did recently (e.g. objective function works with autodiff for the log-determinant and the new (?) feature in formulaic for symbolic differentiation of model matrices). Would you accept such a PR?
I wish you happy easter holidays!
Totally overlooked this over easter. Sorry! One heretical question: why does one needs ipopt for DoE, should SLSQP not also do the job?
Unfortunately it is necessary to use IPOPT, because for larger scale problems (e.g. 20 inputs, fully-quadratic model) the "normal" scipy.optimize.minimize
gets reeeeallly slow. Since the interface of minimize_ipopt
and scipy's minimize
is the same, you can easily convince yourself by replacing minimize_ipopt
with the scipy method.
Thanks for the info, then it is like it is and we stick with your original implementation for the nchooseks and keep it just as option. Thanks for your effort, it was just an idea as it was working with slsqp ...
Another option could be to have in find_local_max_ipopt
another argument (use_slsqp: bool = False
) to specify which optimizer to use. The default would be ipopt, but one could switch to slsqp and then use there the other way of dealing with the nchooseks. This could be helpful for smaller problems (6 inputs etc.). What do you think?
We could add this as an option instead of of nchoosek_as_bounds
(since nchoosek not as bounds seems to be useless for ipopt). However, I just did some tests with IPOPT + nchoosek as bounds and SLSQP + nchoosek with narrow gaussians.
I used 6 inputs with bounds (0,1), a fully-quadratic model, n_experiment=10 and the following constraints
[NChooseKConstraint(features=[f"x{i+1}" for i in range(6)], min_count=0, max_count=3, none_also_valid=True)]
[NChooseKConstraint(features=[f"x{i+1}" for i in range(6)], min_count=0, max_count=3, none_also_valid=True), LinearEqualityConstraint(features=[f"x{i+1}" for i in range(6)], coefficients=[1,1,1,1,1,1], rhs=1)]
[NChooseKConstraint(features=[f"x{i+1}" for i in range(5)], min_count=0, max_count=1, none_also_valid=True), LinearEqualityConstraint(features=[f"x{i+1}" for i in range(6)], coefficients=[1,1,1,1,1,1], rhs=1)]
In the first two cases the NChooseK constraint is only semi-relevant since the optimal design without the nchoosek constraint fulfills the nchoosek constraints. The last case has a "real" nchoosek constraint. I have run each of these cases three times with these average objective values:
So to me this indicates that the current solution seems to be a pretty good surrogate for the actual nchoosek constraint (at least in this simple example), but is easier to handle (for both IPOPT and SLSQP). On the other hand, the current solution can only handle nchoosek constraints with min_count=0 (this case was the only one existing in mopti
). So if you see an advantage in having the option of SLSQP I will add it. Personally, I have the feeling that it will be a very rare event that SLSQP + narrow gaussians will perform better than IPOPT + nchoosek as bounds.
Okay, then we remove it. It was worth a try ;)
This is solved.
Personally, I have the feeling that it will be a very rare event that SLSQP + narrow gaussians will perform better than IPOPT + nchoosek as bounds.
Thank you for the thorough evaluation of these two methods. This is the kind of testing that I was hoping to get to. @Osburg and @jduerholt I appreciate you both exploring this.
On the other hand, the current solution can only handle nchoosek constraints with min_count=0 (this case was the only one existing in mopti)
In a formulation setting, I take it min_count=0
means the algorithm is allowed to make the nonsensical choice of "add no ingredients", but that if this case happens, it can be rejected manually by the user during the optimization.
@Osburg, I also have a follow-up question. Could you give a brief description of what the "original implementation" or "current solution" is?
From https://github.com/experimental-design/bofire/issues/145#issuecomment-1480263796
IPOPT had problems dealing with this constraint function (in fact, the current solution for NChooseK constraints in doe was chosen, because IPOPT was not able to find acceptable designs using this constraint function).
From https://github.com/experimental-design/bofire/issues/145#issuecomment-1504997635
Thanks for the info, then it is like it is and we stick with your original implementation for the nchooseks and keep it just as option. Thanks for your effort, it was just an idea as it was working with slsqp
Let me already comment from high level, @Osburg can then further add on it ;)
It is important to know that we are talking in this issue just about the DoE submodule of Bofire, which can be used to generate for example D-optimal plans under constraints and different model functions. Despite D optimality also other optimality measures like K-optimality are possible (https://github.com/experimental-design/bofire/blob/main/bofire/strategies/enum.py#L4)
In this contect we have tested the "original implementation" against the narrow gaussian method. Note that we use in the BO strategies based on botorch always the narrow gaussian approach.
So what is the "orginal implementation"? @Osburg please correct if I am wrong ;)
The original implementation which is also the current one is described here: https://basf.github.io/doe/nchoosek_constraint/
Just as additional note: the DoE part of BoFire was originally developed at BASF and later merged into BoFire where it is further developed. Unfortunately, we have not yet migrated the documentation ;)
@jduerholt yes, this is how it still works :) Originally, I was asked to add support for NChooseK constraints in combination with a mixture constraint. Because of the symmetry of this constraint all variables are interchangeable. The assumption of the "current solution" is that we have such a symmetry. If this assumption is met we have a handwavy argument why the "current solution" works: because of the symmetry of the problem all variables should be 0 equally often in a D-optimal design. So we can take all possible combinations of k-n
out of n
variables, pick one combination out of this list for each experiment and fix the corresponding variables to 0.
This defines a constraint that is actually more restrictive than the original NChooseK constraint, but in this symmetric situation a D-optimal design should fulfill both the NChooseK constraint and also the more restrictive surrogate we just constructed. This is also we have observed when comparing the "current solution"+IPOPT vs. SLSQP + narrow Gaussians.
However, as described in the documentation (https://basf.github.io/doe/nchoosek_constraint/) it is easy to find examples (by breaking the symmetry assumptions) where the stronger constraint removes substantial parts of the feasible set (i.e. a D-optimal design using the stronger constraints is much worse / not a minimum when using the original constraints). But so far I did not get any complaints that the "current solution" creates too restrictive constraints, probably because the people using the doe subpackage only use it for problems that are close enough to a symmetric situation as described above (?) :D
@jduerholt @Osburg
Firstly, thanks for your work in implementing and verifying the NChooseK solution for DoE in BoFire.
Sorry for reopening the issue. I have use cases where having min_count to be greater than 0 is needed. As noted above, I could generate samples and then filter out the ones that do not meet what I want but I would prefer to have the method be able to account for min_count > 0.
Is there a limitation in being able to "easily" implement this or would it require re-evaluating how NChooseK works in its current form, e.g. having to re-visit the possibility for having the narrow Gaussian approach be a alternative method in these cases?
Cheers
@simonsung06 There is an implementation of what you need, see for example here https://github.com/experimental-design/bofire/blob/b04904d281872be521cf6a4144c52f5143359c88/tests/bofire/strategies/test_doe.py#L204 The only problem is that this can take quite long, as the full categorical problem is build and solved via relaxation, - or if that fails to provide a feasible solution - via branch and bound. How does your use case look like?
@simonsung06 just as an addition to what @dlinzner-bcs already wrote: In case of min_count = 1 it could be an idea to just add a linear constraint (in case of nonnegative decision variables)
$x_1 + x_2 + ... + x_n >= c$
where $c > 0$. In a more general case where negative values for some of the decision variables are allowed you could replace it by an L1-Norm constraint. This should do the job without increasing the computational effort a lot. I tested it on this small example
from bofire.strategies.doe.design import find_local_max_ipopt
from bofire.data_models.domain.api import Domain
from bofire.data_models.constraints.api import NChooseKConstraint, LinearInequalityConstraint
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput
import numpy as np
domain = Domain(
inputs = [ContinuousInput(key=f"x{i+1}", bounds=(0,1)) for i in range(3)],
outputs = [ContinuousOutput(key="y")],
constraints = [
NChooseKConstraint(features=["x1","x2","x3"], min_count=0, max_count=2, none_also_valid=True),
]
)
res = find_local_max_ipopt(
domain=domain,
model_type="linear",
n_experiments=7,
ipopt_options={"maxiter":500},
)
np.round(res,2)
The result without the additional linear constraint is
x1 | x2 | x3 | |
---|---|---|---|
exp0 | -0.0 | -0.0 | 0.0 |
exp1 | 1.0 | 0.0 | 1.0 |
exp2 | 0.0 | 1.0 | 1.0 |
exp3 | -0.0 | -0.0 | 0.0 |
exp4 | 1.0 | 0.0 | 1.0 |
exp5 | 0.0 | -0.0 | -0.0 |
exp6 | 1.0 | 1.0 | 0.0 |
By adding the linear constraint we can avoid the experiment (0,0,0). So in this very simple case it seems to work.
domain = Domain(
inputs = [ContinuousInput(key=f"x{i+1}", bounds=(0,1)) for i in range(3)],
outputs = [ContinuousOutput(key="y")],
constraints = [
LinearInequalityConstraint(features=[f"x{i+1}" for i in range(3)], coefficients=[-1,-1,-1], rhs=-1),
NChooseKConstraint(features=["x1","x2","x3"], min_count=0, max_count=2, none_also_valid=True),
]
)
res = find_local_max_ipopt(
domain=domain,
model_type="linear",
n_experiments=7,
ipopt_options={"maxiter":500},
)
np.round(res,2)
x1 | x2 | x3 | |
---|---|---|---|
exp0 | 1.0 | -0.0 | 0.0 |
exp1 | 0.0 | 1.0 | -0.0 |
exp2 | -0.0 | 0.0 | 1.0 |
exp3 | 1.0 | -0.0 | 0.0 |
exp4 | 0.0 | 1.0 | 1.0 |
exp5 | 1.0 | 0.0 | 1.0 |
exp6 | 1.0 | 1.0 | 0.0 |
Thank you @dlinzner-bcs and @Osburg for the advice on this. It is nice to see that min_count
can be above 0.
For my situation, it was for a mixture style DoE with 5 components that needed a LinearEqualityConstraint
and a NChooseKConstraint
for 2 of the components with min_count=1
, max_count=2
, and none_also_valid=False
. Furthermore, I also provided past experiments using the set_candidates
function.
The key change in my set up that made it give a valid a design that passes the NChooseKConstraint
was to have the lower bounds of the NChooseKConstraint
features be greater than 0. I like that the unselected variables are still 0 whereas selected variables still use the specified bounds!
Another query that I have: I generated the design using the OptimalityCriterionEnum.D_OPTIMALITY
objective but I would have liked to compare the result with OptimalityCriterionEnum.SPACE_FILLING
but the latter did not converge to a solution when I tried it, and was repeatedly producing the message: current length of branching queue (+ new branches): {pre_size} + {len(next_branches)} currently explored branches: {num_explored}, current best value: {current_branch.value}
. Is there a way to specify a maximum number of iterations; or can you suggest something else?
Using the method I proposed for min_count=1 I do not run into your problem for a problem of the same structure as yours (so maybe it is worth for you to try it out on your problem, too).
I created the following domain
domain = Domain(
inputs = [ContinuousInput(key=f"x{i+1}", bounds=(0,1)) for i in range(5)],
outputs = [ContinuousOutput(key="y")],
constraints = [
LinearEqualityConstraint(features=[f"x{i+1}" for i in range(5)], coefficients=[1,1,1,1,1], rhs=1),
NChooseKConstraint(features=["x1","x2"], min_count=0, max_count=2, none_also_valid=False),
LinearInequalityConstraint(features=["x1","x2"], coefficients=[-1,-1], rhs=-0.1),
]
)
This should roughly correspond to your problem. To ensure min_count=1 I added the additional linear equality constraint (where I assumed that we want at least a share of 10% for the active component, but it is open to you to choose a value you consider large enough). I assumed a fully-quadratic model and added two fixed experiments to imitate your problem as good as possible.
res = find_local_max_ipopt(
domain=domain,
model_type="fully-quadratic",
fixed_experiments=pd.DataFrame(
np.array([[0.5,0.5,0,0,0], [0.5,0,0.5,0,0]]), columns=domain.inputs.get_keys()),
objective=OptimalityCriterionEnum.D_OPTIMALITY,
ipopt_options={"maxiter":300, "disp":5, "timing_statistics":"yes"},
)
np.round(res,2)
The resulting design looks like this
x1 | x2 | x3 | x4 | x5 | |
---|---|---|---|---|---|
exp0 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 |
exp1 | 0.50 | 0.00 | 0.50 | 0.00 | 0.00 |
exp2 | 0.10 | -0.00 | -0.00 | 0.44 | 0.46 |
exp3 | -0.00 | 0.10 | 0.44 | 0.46 | -0.00 |
exp4 | 1.00 | -0.00 | 0.00 | -0.00 | -0.00 |
exp5 | 0.10 | -0.00 | -0.00 | -0.00 | 0.90 |
exp6 | 0.10 | -0.00 | 0.45 | -0.00 | 0.45 |
exp7 | 0.50 | 0.50 | -0.00 | -0.00 | -0.00 |
exp8 | -0.00 | 0.10 | 0.90 | -0.00 | -0.00 |
exp9 | 0.53 | -0.00 | -0.00 | -0.00 | 0.47 |
exp10 | 0.10 | -0.00 | -0.00 | 0.90 | -0.00 |
exp11 | -0.00 | 1.00 | -0.00 | 0.00 | -0.00 |
exp12 | -0.00 | 0.10 | 0.44 | -0.00 | 0.46 |
exp13 | 0.55 | -0.00 | -0.00 | 0.45 | -0.00 |
exp14 | -0.00 | 0.55 | 0.45 | -0.00 | -0.00 |
exp15 | -0.00 | 0.53 | -0.00 | -0.00 | 0.47 |
exp16 | -0.00 | 0.50 | -0.00 | 0.50 | -0.00 |
exp17 | 0.55 | -0.00 | -0.00 | 0.45 | -0.00 |
exp18 | -0.00 | 0.55 | 0.45 | -0.00 | -0.00 |
exp19 | -0.00 | 0.50 | -0.00 | 0.50 | -0.00 |
exp20 | 0.10 | -0.00 | 0.46 | 0.44 | -0.00 |
exp21 | -0.00 | 0.10 | -0.00 | 0.45 | 0.45 |
exp22 | 0.51 | -0.00 | 0.49 | -0.00 | -0.00 |
exp23 | -0.00 | 0.10 | -0.00 | -0.00 | 0.90 |
Using the space filling objective instead of d optimality IPOPT still converges and produces the following design
x1 | x2 | x3 | x4 | x5 | |
---|---|---|---|---|---|
exp0 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 |
exp1 | 0.50 | 0.00 | 0.50 | 0.00 | 0.00 |
exp2 | 0.40 | 0.01 | 0.01 | 0.57 | 0.01 |
exp3 | 0.32 | 0.33 | 0.03 | 0.30 | 0.01 |
exp4 | 0.11 | 0.02 | 0.61 | 0.01 | 0.25 |
exp5 | 0.31 | 0.32 | 0.35 | 0.01 | 0.01 |
exp6 | 0.01 | 0.31 | 0.01 | 0.01 | 0.66 |
exp7 | 0.08 | 0.04 | 0.01 | 0.86 | 0.01 |
exp8 | 0.91 | 0.02 | 0.02 | 0.04 | 0.01 |
exp9 | 0.62 | 0.01 | 0.02 | 0.02 | 0.33 |
exp10 | 0.10 | 0.01 | 0.32 | 0.56 | 0.01 |
exp11 | 0.01 | 0.25 | 0.01 | 0.54 | 0.19 |
exp12 | 0.01 | 0.65 | 0.32 | 0.01 | 0.01 |
exp13 | 0.08 | 0.03 | 0.63 | 0.25 | 0.01 |
exp14 | 0.00 | 0.34 | 0.33 | 0.01 | 0.32 |
exp15 | 0.31 | 0.31 | 0.01 | 0.02 | 0.35 |
exp16 | 0.00 | 0.34 | 0.33 | 0.32 | 0.01 |
exp17 | 0.07 | 0.05 | 0.02 | 0.33 | 0.52 |
exp18 | 0.10 | 0.01 | 0.31 | 0.01 | 0.57 |
exp19 | 0.01 | 0.64 | 0.01 | 0.01 | 0.32 |
exp20 | 0.32 | 0.01 | 0.00 | 0.01 | 0.66 |
exp21 | 0.01 | 0.64 | 0.01 | 0.32 | 0.01 |
exp22 | 0.27 | 0.03 | 0.25 | 0.22 | 0.23 |
exp23 | 0.02 | 0.95 | 0.01 | 0.01 | 0.01 |
Does this approach go in the right direction or did I misunderstand you somewhere? Hope this is somewhat helpful...
Cheers, Aaron :)
P.S.: One question: You impose an NChooseKConstraint on 2 variables with max_count=2. If it is a valid option for you to use a linear inequality for ensuring that one component is nonzero, you could also just drop the NChooseKConstraint (since it will never be active), right? In fact, due to a bug it is not possible to use an NChooseKConstraint w/ max_count = #(variables in the constraint). I just opened a PR (#304) to fix this. If you do not want to wait for it to be merged it should be valid to just omit the NChooseKConstraint.
@simonsung06
Another query that I have: I generated the design using the
OptimalityCriterionEnum.D_OPTIMALITY
objective but I would have liked to compare the result withOptimalityCriterionEnum.SPACE_FILLING
but the latter did not converge to a solution when I tried it, and was repeatedly producing the message:current length of branching queue (+ new branches): {pre_size} + {len(next_branches)} currently explored branches: {num_explored}, current best value: {current_branch.value}
. Is there a way to specify a maximum number of iterations; or can you suggest something else?
Currently, there is no maximum number sadly. There are two things that you can try (aside from Aarons solution above):
set the strategy to "relaxed" and turn the validator off via
candidates = strategy.ask(
candidate_count=n_experiments, raise_validation_error=False
)
, which will consider the problem as a relaxed one. One risk is that you might not end up with a valid design at the end.
Hi @dlinzner-bcs @Osburg
Thanks for the suggestions. They all proved to be useful for my problem and also to help me understand more in general. I liked how it looks without the NChooseKConstraint and just using the LinearInequalityConstraint instead. Still worked and generated designs quickly too. The iterative strategy also worked but I preferred the designed using the previously mentioned methods, but it is great to know this option exists :)
@DavidWalz @Osburg
As iptopt is also supporting non-linear constraints, I thought about if it could be worth to implement a continuous relaxation for the NChooseK constraints. This would also make it possible to fully support NChooseK constraints in DoE. I am currently adding this continuous relaxation also to the BayesOpt methods in Bofire.
You can find a description here: https://github.com/pytorch/botorch/issues/1749#issuecomment-1474763398
Just have a look at the linked tutorial notebook there. I tested it in BayesOpt and it works ;) What do you think?