Closed TobyBoyne closed 7 months ago
As shown in strategies/entmoot/entmoot_example.py
, the ask()
method now matches the syntax for other strategies.
Is there any functionality that is missing? If not, I will add tests (especially for the converting between features/constraints), and better documentation.
Hi @jduerholt,
Thank you for the review - lots of helpful feedback!
I still haven't addressed the issue of hint typing parameters - I started looking into the surrogates, but I'm not sure I understand what function they have? I can implement the types as a simple class that inherits from BaseModel, but I'd rather use surrogates if that is the proper way to do things. As mentioned in the comments, I can't use any of the provided surrogates since they don't have the same attributes (ENTMOOT uses lbgm
, and xgboost
is not yet supported.
Kind regards, Toby
For reference, here is a mock-up of what the enting_params
looks like. Note that the parameters in TrainParams are defined by lgbm
, and I can't find a comprehensive Type of all possible options.
from typing import Literal
class EntingParams:
unc_params: 'UncParams'
tree_train_parameters: 'TreeTrainParams'
class UncParams:
beta: float = 1.96 #>0
acq_sense: Literal["exploration", "penalty"] = "exploration"
dist_trafo: Literal["normal", "standard"] = "normal"
dist_metric: Literal["euclidean_squared", "l1", "l2"] = "euclidean_squared"
cat_metric: Literal["overlap", "of", "goodall4"] = "overlap"
class TreeTrainParams:
train_lib: Literal["lgbm"] = "lgbm"
train_params: 'TrainParams'
class TrainParams:
# this is determined by lightbgm
# the typing in the package uses Dict[str, Any]
# the defaults used by ENTMOOT are shown below
objective: str = "regression"
metric: str = "rmse"
boosting: str = "gbdt"
num_boost_round: int = 100
max_depth: int = 3
min_data_in_leaf: int = 1
min_data_per_group: int = 1
verbose: int = -1
Thanks @TobyBoyne, I was not at home over the weekend. I will have a detailed look tmr. Best, Johannes
Hi Toby,
I would opt for someting like this:
class LGBMSurrogate(Surrogate, TrainableSurrogate):
type: Literal["LGBMSurrogate"] = "LGBMSurrogate"
objective: str = "regression"
metric: str = "rmse"
boosting: str = "gbdt"
num_boost_round: Annotated[int, Field(ge=1)] = 100
max_depth: Annnotated[int, Field(ge=1)] = 3
min_data_in_leaf: Annotated[int, Field(ge=1)] = 1
min_data_per_group: Annotated[int, Field(ge=1)] = 1
# verbose: int = -1
class EntingSurrogates(Surrogates): # Surrogates is a new common base class for both BotorchSurrogates and EntingSurrogates
surrogates: List[LGBMSurrogate]
@validator(surrogates)
def validate_surrogates()
class EntingStrategy(PredictiveStrategy):
type: Literal["EntingStrategy"] = "EntingStrategy"
# unc params
beta: Annotated[float, Filed(gt=0)] = 1.96
acq_sense: Literal["exploration", "penalty"] = "exploration"
dist_trafo: Literal["normal", "standard"] = "normal"
dist_metric: Literal["euclidean_squared", "l1", "l2"] = "euclidean_squared"
cat_metric: Literal["overlap", "of", "goodall4"] = "overlap"
# surrogate params
surrogate_specs: Optional[EntingSurrogates] = None
# solver params
solver_params: Dict[str, Any] = {} # do we have any defaults here?
In Bofire we have surrogates and strategies, surrogates are regression models that can be used to model output values of experiments and (predictive) strategies use surrogates for optimization. The 100% solution would be to implement a new surrogate model called LGBMSurrogate
, which can be used as standalone. And via the parameters of the surrogate datamodel one could then modify the hyperparameters of the LGBM Model used within the Enting Strategy.
We do the same for the botorch based strategies, there also different model types and hyperparams can be used within the same optimization for different outputs. Even subsets of outputs can be used. I looked a bit in the Entmoot code, and from my understanding the model parameters are globally used, meaning that for every output that should be optimized a model with the same hyperparameters and input features is trained. One could guarantee this behavior with a validator in the class EntingSurrogates
. As soon as more freedom is allowed in Entmoot one could also relax the validators ... Maybe one could also derive the EntingSurrogates
together with the BotorchSurrgoates
from a common base class. I try to come up with a PR for such a common base class.
What do you think? Is this somehow clear?
Best,
Johannes
Here is the first version of the mentioned PR: https://github.com/experimental-design/bofire/pull/283
Hi @jduerholt,
Yes, I think that all makes sense - thank you for the explanation. So to confirm, as an example, the _fit()
logic should be moved to LGBMSurrogate
, but the _ask()
method should remain in the strategy? I'm looking through BotorchStrategy._ask()
and can't find any references to the surrogate - is the surrogate not used in this method?
Also, in terms of moving forward - do I have write permissions for the feature/surrogates
branch/PR? If so, should I implement the EntingSurrogates
there, and if not, should I wait for that pull request to be accepted before creating the EntingSurrogates?
Kind regards, Toby
Good point! Will work on it for Entmoot.
Am Di., 12. Sept. 2023 um 13:23 Uhr schrieb Robert Lee < @.***>:
@.**** commented on this pull request.
In bofire/data_models/strategies/enting.py https://github.com/experimental-design/bofire/pull/278#discussion_r1322893937 :
+) +from bofire.data_models.features.api import (
- CategoricalDescriptorInput,
- CategoricalInput,
- ContinuousInput,
- ContinuousOutput,
- DiscreteInput,
- Feature, +) +from bofire.data_models.objectives.api import MinimizeObjective, Objective +from bofire.data_models.strategies.strategy import Strategy
+class EntingStrategy(Strategy):
- type: Literal["EntingStrategy"] = "EntingStrategy"
- enting_params: Dict[str, Any] = {}
I am not a big fan of having dictionary fields that can have anything inside, as it is somehow contraty of using a validation framework like pydantic. Can you provide the possible parameters and its defaults and types and validation, that can be used as fields of the data model?
This change should be made directly in entmoot @.*** https://github.com/spiralulam wanted to do it); Johannes's argument is in general a good one and not just valid here
— Reply to this email directly, view it on GitHub https://github.com/experimental-design/bofire/pull/278#discussion_r1322893937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK3L6ODLKJ6DRHFCV472C6DX2BA4NANCNFSM6AAAAAA4NGSZYI . You are receiving this because you were mentioned.Message ID: @.***>
I'm looking through
BotorchStrategy._ask()
and can't find any references to the surrogate - is the surrogate not used in this method?
In botorch
, the model
is attached to the acquistion function object, for this reason we first get an acquistion function object in _ask
and then use this object which has the model inside for the optimization: https://github.com/experimental-design/bofire/blob/a4ca3eb2ecf8348e5c387cbef273191c1ba54d45/bofire/strategies/predictives/botorch.py#L256 and here: https://github.com/experimental-design/bofire/blob/a4ca3eb2ecf8348e5c387cbef273191c1ba54d45/bofire/strategies/predictives/sobo.py#L60.
The self.model
object is created here on the basis of the surrogates: https://github.com/experimental-design/bofire/blob/a4ca3eb2ecf8348e5c387cbef273191c1ba54d45/bofire/strategies/predictives/botorch.py#L113
In the call to compatibilize
, the individual surrogates are compile in one botorch ModelList
object.
So to confirm, as an example, the _fit() logic should be moved to LGBMSurrogate, but the _ask() method should remain in the strategy?
You would have it in both the surrgate and the strategy, but you can call the in the strategies _fit
the _fit
of the surrogate, but I am not sure if this is currently needed, as Entmoot is currently not supporting different surrogate specs for different outputs. So you could do it independly from each other for now.
Also, in terms of moving forward - do I have write permissions for the feature/surrogates branch/PR? If so, should I implement the EntingSurrogates there, and if not, should I wait for that pull request to be accepted before creating the EntingSurrogates?
I try to get the PR trough the latest tmr, but in terms of moving forward, I would recommend that you just create a new branch/PR in which you first just implement the LGBMSurrogate
(including tests) as this is the first thing that is needed. In the meantime, I will then finalize my PR and you can then implement the EntingSurrogates
and the EntingStrategy
on top of both.
For the LGBMSurrogate
, you can draw inspiration from the XGBSurrogate
.
What do you think?
When we have the EntingStrategy
included we will also refactor the directory strategy for the botorch based strategies into strategies/predictives/botorch/
.
I would recommend that you just create a new branch/PR in which you first just implement the
LGBMSurrogate
(including tests) as this is the first thing that is needed
Happy to work on that! Thanks for all the reviewing and help, very much appreciated!
I've made an initial commit for the surrogate, however in order to implement the regression in #285, I need the utils in this PR - any thoughts on how I can overcome that?
I've been working on implementing changes in the Entmoot codebase. Since this was last discussed, there are a few things that have changed:
EntingParams
. This has been implemented in entmoot - see that repo for more information. The solver still accepts Dict[any], but that's somewhat out of our hands in that Pyomo doesn't have an extensive list of arguments.I will also review it ;) Totally overlooked this :D
Hi, this should cover the necessary tests and pydantic models.
Just a noted - for the test_propose_optimal_point
, we wanted to test that the proposed point would be in the unexplored area near the optimum. This behaviour isn't actually observed (I think the process of generating the data must bias the optimiser to selecting the regions it has seen). If the test as initially described is desired, then there's likely some tweaking that needs to be done.
(also, sorry for the delay!)
Hi Toby, thank you very much. I will have a look the latest tmr. In the meantime, you could already try to solve the merge conflicts with the main branch buy merging main into your branch. Best, Johannes
I think this should cover all of the requested changes - please let me know if I missed anything.
Regarding the candidates
, I think my solution should address this. It is slightly clunky, in my opinion. For example, if I already have generated 9 candidates, then I want to generate a 10th, I need to retrain my model on those 9 candidates sequentially, since the batch process for ENTMOOT is sequential. I can't find a neat way to store the prediction of a candidate in the current class design, and I have to discard the model as it fits on the fantasies (https://github.com/TobyBoyne/bofire/blob/d8eb2b6daecf5bc62bbb432d47f2c551ad316f39/bofire/strategies/predictives/enting.py#L281) because otherwise, the fantasy observations will be seen as a "real experiment" which is not the case.
So my solution ends up slow if you are doing multiple sequential calls to .ask()
, but I think it's the most correct. You can see this test (https://github.com/TobyBoyne/bofire/blob/d8eb2b6daecf5bc62bbb432d47f2c551ad316f39/tests/bofire/strategies/test_enting.py#L124) for what I think the desired behaviour should look like.
Hi @TobyBoyne,
I will do a proper review this evening. Thank you already!
Best,
Johannes
Addressed all above comments. I also raised an issue on ENTMOOT about random state because I think you are definitely correct! https://github.com/cog-imperial/entmoot/issues/31
Two sources of issues that I've found so far:
Entmoot requires gurobipy
, even if it isn't being used. This is an issue with Entmoot (https://github.com/cog-imperial/entmoot/issues/32), but I've added it to setup.py
for now
pyright isn't happy with domain_to_problem_config
. This is because the functions only accept a subset of the possible feature types. To me, the solution would be to make the type hinting in _bofire_feat_to_entmoot
less specific. However, this undoes a change that @jduerholt recommended earlier in this PR, so I just wanted to flag it so that you are aware I'm reverting it!
Edit: Actually, domain.inputs.get()
returns a list of type AnyFeature
, not AnyInput
. Is there any way around this?
I've made some progress. There are quite a few # type: ignore
s in enting.py
which may not all be necessary, but I used to get around pyright and the above issue about the return types of output.get()
and input.get()
.
However, I'm still getting issues with running the pipeline. On my machine, if I remove the license file, the tests that require Gurobi are skipped. However, the pipeline seems to still be attempting to run these, and then having issues. Any ideas?
(Note that pyomo.common.errors.ApplicationError: Solver (gurobi) did not exit normally
error is raised when there is no license file).
Using type:ignore
when using the get
methods is fine. Pyright is not able to work with this dynamic behavior. Currently, I cannot start the pipeline, can you resolve the merge conflicts? They came in due to a change by Behrang ;) I will then have a look again at the pipeline and the tests.
Found the results from the latest pipeline execution here: https://github.com/TobyBoyne/bofire/actions/runs/8236200085/job/22522063543, and I also tried your catch for gurobi locally, (i just installed it via pip) without having a liscense and got this:
Any ideas?
Here it is stated that it comes with a trial liscense that can be used only for small problems: https://pypi.org/project/gurobipy/
Hi @TobyBoyne,
I thought about it, to get it quickly implemented, just mark the test in which gurobi is needed with slow and we are done ;)
Please also resolve the merge conflicts and then we can have it finally in.
Best,
Johannes
Hi @jduerholt, I've added this in and solved merge conflicts. Looks like the tests are all passing...?
Hopefully this should be good to go! Thank you for all of the reviews! Looking forward to the next PR :)
Thanks a lot, Toby and Johannes!
@jduerholt, @TobyBoyne Thanks to both of you for getting this through. Are we happy with the treatment of the optional entmoot dependencies now or was this more of a short-term fix and we still have some open to-dos?
Initial progress on converting a BoFire problem to an ENTMOOT optimisation problem.