Implement ENTMOOT in Bofire

TobyBoyne commented 1 year ago

Initial progress on converting a BoFire problem to an ENTMOOT optimisation problem.

TobyBoyne commented 1 year ago

As shown in strategies/entmoot/entmoot_example.py, the ask() method now matches the syntax for other strategies.

Is there any functionality that is missing? If not, I will add tests (especially for the converting between features/constraints), and better documentation.

TobyBoyne commented 1 year ago

Hi @jduerholt,

Thank you for the review - lots of helpful feedback!

I still haven't addressed the issue of hint typing parameters - I started looking into the surrogates, but I'm not sure I understand what function they have? I can implement the types as a simple class that inherits from BaseModel, but I'd rather use surrogates if that is the proper way to do things. As mentioned in the comments, I can't use any of the provided surrogates since they don't have the same attributes (ENTMOOT uses lbgm, and xgboost is not yet supported.

Kind regards, Toby

TobyBoyne commented 1 year ago

For reference, here is a mock-up of what the enting_params looks like. Note that the parameters in TrainParams are defined by lgbm, and I can't find a comprehensive Type of all possible options.


from typing import Literal

class EntingParams:
    unc_params: 'UncParams'
    tree_train_parameters: 'TreeTrainParams'

class UncParams:
    beta: float = 1.96 #>0
    acq_sense: Literal["exploration", "penalty"] = "exploration"
    dist_trafo: Literal["normal", "standard"] = "normal"
    dist_metric: Literal["euclidean_squared", "l1", "l2"] = "euclidean_squared"
    cat_metric: Literal["overlap", "of", "goodall4"] = "overlap"

class TreeTrainParams:
    train_lib: Literal["lgbm"] = "lgbm"
    train_params: 'TrainParams'

class TrainParams:
    # this is determined by lightbgm
    # the typing in the package uses Dict[str, Any]
    # the defaults used by ENTMOOT are shown below
    objective: str = "regression"
    metric: str = "rmse"
    boosting: str = "gbdt"
    num_boost_round: int = 100
    max_depth: int = 3
    min_data_in_leaf: int = 1
    min_data_per_group: int = 1
    verbose: int = -1

jduerholt commented 1 year ago

Thanks @TobyBoyne, I was not at home over the weekend. I will have a detailed look tmr. Best, Johannes

jduerholt commented 1 year ago

Hi Toby,

I would opt for someting like this:


class LGBMSurrogate(Surrogate, TrainableSurrogate):
    type: Literal["LGBMSurrogate"] = "LGBMSurrogate"
    objective: str = "regression"
    metric: str = "rmse"
    boosting: str = "gbdt"
    num_boost_round: Annotated[int, Field(ge=1)] = 100
    max_depth: Annnotated[int, Field(ge=1)] = 3
    min_data_in_leaf: Annotated[int, Field(ge=1)] = 1
    min_data_per_group: Annotated[int, Field(ge=1)] = 1
    # verbose: int = -1

class EntingSurrogates(Surrogates): # Surrogates is a new common base class for both BotorchSurrogates and EntingSurrogates
    surrogates: List[LGBMSurrogate]

    @validator(surrogates)
    def validate_surrogates()

class EntingStrategy(PredictiveStrategy):
    type: Literal["EntingStrategy"] = "EntingStrategy"
    # unc params
    beta: Annotated[float, Filed(gt=0)] = 1.96
    acq_sense: Literal["exploration", "penalty"] = "exploration"
    dist_trafo: Literal["normal", "standard"] = "normal"
    dist_metric: Literal["euclidean_squared", "l1", "l2"] = "euclidean_squared"
    cat_metric: Literal["overlap", "of", "goodall4"] = "overlap"
    # surrogate params
    surrogate_specs: Optional[EntingSurrogates] = None
    # solver params
    solver_params: Dict[str, Any] = {} # do we have any defaults here?

In Bofire we have surrogates and strategies, surrogates are regression models that can be used to model output values of experiments and (predictive) strategies use surrogates for optimization. The 100% solution would be to implement a new surrogate model called LGBMSurrogate, which can be used as standalone. And via the parameters of the surrogate datamodel one could then modify the hyperparameters of the LGBM Model used within the Enting Strategy.

We do the same for the botorch based strategies, there also different model types and hyperparams can be used within the same optimization for different outputs. Even subsets of outputs can be used. I looked a bit in the Entmoot code, and from my understanding the model parameters are globally used, meaning that for every output that should be optimized a model with the same hyperparameters and input features is trained. One could guarantee this behavior with a validator in the class EntingSurrogates. As soon as more freedom is allowed in Entmoot one could also relax the validators ... Maybe one could also derive the EntingSurrogates together with the BotorchSurrgoates from a common base class. I try to come up with a PR for such a common base class.

What do you think? Is this somehow clear?

Best,

Johannes

jduerholt commented 1 year ago

Here is the first version of the mentioned PR: https://github.com/experimental-design/bofire/pull/283

TobyBoyne commented 1 year ago

Hi @jduerholt,

Yes, I think that all makes sense - thank you for the explanation. So to confirm, as an example, the _fit() logic should be moved to LGBMSurrogate, but the _ask() method should remain in the strategy? I'm looking through BotorchStrategy._ask() and can't find any references to the surrogate - is the surrogate not used in this method?

Also, in terms of moving forward - do I have write permissions for the feature/surrogates branch/PR? If so, should I implement the EntingSurrogates there, and if not, should I wait for that pull request to be accepted before creating the EntingSurrogates?

Kind regards, Toby

spiralulam commented 1 year ago

Good point! Will work on it for Entmoot.

Am Di., 12. Sept. 2023 um 13:23 Uhr schrieb Robert Lee < @.***>:

@.**** commented on this pull request.

In bofire/data_models/strategies/enting.py https://github.com/experimental-design/bofire/pull/278#discussion_r1322893937 :

+) +from bofire.data_models.features.api import (

CategoricalDescriptorInput,

CategoricalInput,

ContinuousInput,

ContinuousOutput,

DiscreteInput,

Feature, +) +from bofire.data_models.objectives.api import MinimizeObjective, Objective +from bofire.data_models.strategies.strategy import Strategy

+class EntingStrategy(Strategy):

type: Literal["EntingStrategy"] = "EntingStrategy"

enting_params: Dict[str, Any] = {}

I am not a big fan of having dictionary fields that can have anything inside, as it is somehow contraty of using a validation framework like pydantic. Can you provide the possible parameters and its defaults and types and validation, that can be used as fields of the data model?

This change should be made directly in entmoot @.*** https://github.com/spiralulam wanted to do it); Johannes's argument is in general a good one and not just valid here

— Reply to this email directly, view it on GitHub https://github.com/experimental-design/bofire/pull/278#discussion_r1322893937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK3L6ODLKJ6DRHFCV472C6DX2BA4NANCNFSM6AAAAAA4NGSZYI . You are receiving this because you were mentioned.Message ID: @.***>

jduerholt commented 1 year ago

I'm looking through BotorchStrategy._ask() and can't find any references to the surrogate - is the surrogate not used in this method?

In botorch, the model is attached to the acquistion function object, for this reason we first get an acquistion function object in _ask and then use this object which has the model inside for the optimization: https://github.com/experimental-design/bofire/blob/a4ca3eb2ecf8348e5c387cbef273191c1ba54d45/bofire/strategies/predictives/botorch.py#L256 and here: https://github.com/experimental-design/bofire/blob/a4ca3eb2ecf8348e5c387cbef273191c1ba54d45/bofire/strategies/predictives/sobo.py#L60.

The self.model object is created here on the basis of the surrogates: https://github.com/experimental-design/bofire/blob/a4ca3eb2ecf8348e5c387cbef273191c1ba54d45/bofire/strategies/predictives/botorch.py#L113

In the call to compatibilize, the individual surrogates are compile in one botorch ModelList object.

So to confirm, as an example, the _fit() logic should be moved to LGBMSurrogate, but the _ask() method should remain in the strategy?

You would have it in both the surrgate and the strategy, but you can call the in the strategies _fit the _fit of the surrogate, but I am not sure if this is currently needed, as Entmoot is currently not supporting different surrogate specs for different outputs. So you could do it independly from each other for now.

Also, in terms of moving forward - do I have write permissions for the feature/surrogates branch/PR? If so, should I implement the EntingSurrogates there, and if not, should I wait for that pull request to be accepted before creating the EntingSurrogates?

I try to get the PR trough the latest tmr, but in terms of moving forward, I would recommend that you just create a new branch/PR in which you first just implement the LGBMSurrogate (including tests) as this is the first thing that is needed. In the meantime, I will then finalize my PR and you can then implement the EntingSurrogates and the EntingStrategy on top of both.

For the LGBMSurrogate, you can draw inspiration from the XGBSurrogate.

What do you think?

jduerholt commented 1 year ago

When we have the EntingStrategy included we will also refactor the directory strategy for the botorch based strategies into strategies/predictives/botorch/.

TobyBoyne commented 1 year ago

I would recommend that you just create a new branch/PR in which you first just implement the LGBMSurrogate (including tests) as this is the first thing that is needed

Happy to work on that! Thanks for all the reviewing and help, very much appreciated!

TobyBoyne commented 1 year ago

I've made an initial commit for the surrogate, however in order to implement the regression in #285, I need the utils in this PR - any thoughts on how I can overcome that?

TobyBoyne commented 1 year ago

I've been working on implementing changes in the Entmoot codebase. Since this was last discussed, there are a few things that have changed:

We have discussed that it is not necessary to split this into a surrogate. This PR contains a working strategy implementation, and splitting this into a strategy is potential future work.
The typing issue has been partially resolved with the addition of the EntingParams. This has been implemented in entmoot - see that repo for more information. The solver still accepts Dict[any], but that's somewhat out of our hands in that Pyomo doesn't have an extensive list of arguments.
Other changes include matching entmoot's Constraints to be similar to bofire's, and creating an interface for maximisation problems.

jduerholt commented 11 months ago

I will also review it ;) Totally overlooked this :D

TobyBoyne commented 8 months ago

Hi, this should cover the necessary tests and pydantic models.

Just a noted - for the test_propose_optimal_point, we wanted to test that the proposed point would be in the unexplored area near the optimum. This behaviour isn't actually observed (I think the process of generating the data must bias the optimiser to selecting the regions it has seen). If the test as initially described is desired, then there's likely some tweaking that needs to be done.

(also, sorry for the delay!)

jduerholt commented 8 months ago

Hi Toby, thank you very much. I will have a look the latest tmr. In the meantime, you could already try to solve the merge conflicts with the main branch buy merging main into your branch. Best, Johannes

TobyBoyne commented 8 months ago

I think this should cover all of the requested changes - please let me know if I missed anything.

Regarding the candidates, I think my solution should address this. It is slightly clunky, in my opinion. For example, if I already have generated 9 candidates, then I want to generate a 10th, I need to retrain my model on those 9 candidates sequentially, since the batch process for ENTMOOT is sequential. I can't find a neat way to store the prediction of a candidate in the current class design, and I have to discard the model as it fits on the fantasies (https://github.com/TobyBoyne/bofire/blob/d8eb2b6daecf5bc62bbb432d47f2c551ad316f39/bofire/strategies/predictives/enting.py#L281) because otherwise, the fantasy observations will be seen as a "real experiment" which is not the case.

So my solution ends up slow if you are doing multiple sequential calls to .ask(), but I think it's the most correct. You can see this test (https://github.com/TobyBoyne/bofire/blob/d8eb2b6daecf5bc62bbb432d47f2c551ad316f39/tests/bofire/strategies/test_enting.py#L124) for what I think the desired behaviour should look like.

jduerholt commented 7 months ago

Hi @TobyBoyne,

I will do a proper review this evening. Thank you already!

Best,

Johannes

TobyBoyne commented 7 months ago

Addressed all above comments. I also raised an issue on ENTMOOT about random state because I think you are definitely correct! https://github.com/cog-imperial/entmoot/issues/31

TobyBoyne commented 7 months ago

Two sources of issues that I've found so far:

Entmoot requires gurobipy, even if it isn't being used. This is an issue with Entmoot (https://github.com/cog-imperial/entmoot/issues/32), but I've added it to setup.py for now

pyright isn't happy with domain_to_problem_config. This is because the functions only accept a subset of the possible feature types. To me, the solution would be to make the type hinting in _bofire_feat_to_entmoot less specific. However, this undoes a change that @jduerholt recommended earlier in this PR, so I just wanted to flag it so that you are aware I'm reverting it!

Edit: Actually, domain.inputs.get() returns a list of type AnyFeature, not AnyInput. Is there any way around this?

TobyBoyne commented 7 months ago

I've made some progress. There are quite a few # type: ignores in enting.py which may not all be necessary, but I used to get around pyright and the above issue about the return types of output.get() and input.get().

However, I'm still getting issues with running the pipeline. On my machine, if I remove the license file, the tests that require Gurobi are skipped. However, the pipeline seems to still be attempting to run these, and then having issues. Any ideas?

(Note that pyomo.common.errors.ApplicationError: Solver (gurobi) did not exit normally error is raised when there is no license file).

jduerholt commented 7 months ago

Using type:ignore when using the get methods is fine. Pyright is not able to work with this dynamic behavior. Currently, I cannot start the pipeline, can you resolve the merge conflicts? They came in due to a change by Behrang ;) I will then have a look again at the pipeline and the tests.

jduerholt commented 7 months ago

Found the results from the latest pipeline execution here: https://github.com/TobyBoyne/bofire/actions/runs/8236200085/job/22522063543, and I also tried your catch for gurobi locally, (i just installed it via pip) without having a liscense and got this:

Any ideas?

jduerholt commented 7 months ago

Here it is stated that it comes with a trial liscense that can be used only for small problems: https://pypi.org/project/gurobipy/

jduerholt commented 7 months ago

Hi @TobyBoyne,

I thought about it, to get it quickly implemented, just mark the test in which gurobi is needed with slow and we are done ;)

Please also resolve the merge conflicts and then we can have it finally in.

Best,

Johannes

TobyBoyne commented 7 months ago

Hi @jduerholt, I've added this in and solved merge conflicts. Looks like the tests are all passing...?

Hopefully this should be good to go! Thank you for all of the reviews! Looking forward to the next PR :)

bertiqwerty commented 7 months ago

Thanks a lot, Toby and Johannes!

R-M-Lee commented 7 months ago

@jduerholt, @TobyBoyne Thanks to both of you for getting this through. Are we happy with the treatment of the optional entmoot dependencies now or was this more of a short-term fix and we still have some open to-dos?

experimental-design / bofire

Implement ENTMOOT in Bofire #278

@.**** commented on this pull request.