[GENERAL SUPPORT]: Adjusting search space or accommodating out-of-bounds initial data

facebook / Ax

Adaptive Experimentation Platform

https://ax.dev

MIT License

2.35k stars 303 forks source link

[GENERAL SUPPORT]: Adjusting search space or accommodating out-of-bounds initial data #2584

Closed cheeseheist closed 2 months ago

cheeseheist commented 2 months ago

Question

Hi,

I read through #768 and have some related but separate challenges. I have existing data with wider parameters bounds and want to restrict the parameters bounds for the current optimization based on considerations that the optimization has no visibility of. If I restrict the bounds on the original search space, I get an error because I have data that is out of the bounds of the search space and it doesn't like that. Fine. However, I would like to include as much data as possible to help the model understand the trends and relationship between variables rather than start from scratch.

As a work around, I have set the initial bounds of the search space to be loose enough to include all the existing data and then have tried two different methods of adjusting the search space (I'm using the Developer API btw). First, after generating my model, I tried:

model = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data()) model.model_space.parameters["param_name"].update_range(lower=new_lower,upper=new_upper)

Second, I tried create a new secondary search space and tying this to the .gen command. Both of these seem to have more of less the same behavior. They do force the suggestions to be within the new bounds, but in a seemingly artificial way that seems a bit limiting to me. I get messages like this:

It seems like the model is still using the initial bounds to generate a suggestion, then it is artificially getting clamped down to the new bounds. For situations where you have complex interactions between parameters, this seems potentially worse than just throwing out the out of bounds data and starting from scratch. You can't simply throttle one value down and assume you should still keep all the others the same. It seems like if the methodology were taking into account the new bounds, it would likely explore a different region of the parameter space than simply clamping. Is there any better way to do this such that the acquisition function incorporates the updates bounds rather than this artificial adjustment after a suggestion has been generated?

Thanks!

Please provide any relevant code snippet if applicable.

No response

Code of Conduct

[X] I agree to follow this Ax's Code of Conduct

Abrikosoff commented 2 months ago

Drive by comment: this topic is something I've always wondered about but have found no satisfactory solution; however, I'm not sure if this thread would be relevant for your case?

cheeseheist commented 2 months ago

Drive by comment: this topic is something I've always wondered about but have found no satisfactory solution; however, I'm not sure if this thread would be relevant for your case?

Thanks for the thread. Yeah, I believe that is effectively the same as the second option I tried in the OP.

mgarrard commented 2 months ago

Hi @cheeseheist, thanks or the question and for reading other questions for context. I would say the two options available at this time are

The one you outline above where you pass in a modified search space and then the model will narrow the optimization range
passing in model_kwargs={"fit_out_of_design": True} as is outlined in https://github.com/facebook/Ax/issues/768 -- this will not filter out out of search space points during model fitting and won't perform the clamping that is causing the logging.

So if you are looking to keep those data points completely as is during the model training, I would use option 2. Let me know if you have any follow ups :)

cheeseheist commented 2 months ago

@mgarrard - Hi Mia, thanks for this response! Option 2 sounds like what I am looking for; however, I haven't found how to do it in the Developer API. It seems like in #768 they pass this in the generation strategy using the client, but I'm not using a generation strategy. I attach the data after constructing the experiment using exp.attach_trial([dictionary]). It seems like this is a pass to the modelbridge, but I haven't even specified the modelbridge at the point in the code where I get the error that the data is outside the bounds of the search space. I am not seeing anything in the API in the search space, experiment, or optimization config that would allow me to pass this. Let me know if I'm missing something, or if there is somewhere else I should be passing this kwarg. Is this only possible in the client? I'll keep looking in the meantime.

cheeseheist commented 2 months ago

Or perhaps I'm misunderstanding - do I need to also do option 1 in order to use option 2? Do I need to start with the broad search space, attach all the data, then pass a more restricted search space to the gen while also passing this kwarg to the model?

Abrikosoff commented 2 months ago

I think you can do something like this to pass a GenerationStrategy to a Developer API

from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy

# Experiment
experiment = Experiment(
    name="your_experiment",
    search_space=search_space,
    optimization_config=optimization_config,
    runner=SyntheticRunner(),
)

sobol = Models.SOBOL(search_space=experiment.search_space)
for _ in range(N_INIT):
    trial = sobol.gen(1)
    keys = [f"x{i}" for i in range(6)]
    random.shuffle(keys)
    for k in keys[:3]:
        trial.arms[0]._parameters[k] = 0.0
    experiment.new_trial(trial).run()

data = experiment.fetch_data()
for i in range(N_BATCHES):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")  # Filter SLSQP warnings
        botorch_gen_step = GenerationStep(
            model=Models.BOTORCH_MODULAR,
            num_trials=-1,
            # Pass in kwargs below
            model_gen_kwargs={},
            optimizer_kwargs={},
        )

    generation_strategy = GenerationStrategy(steps=[botorch_gen_step])
    generator_run = generation_strategy.gen(
        experiment=experiment,
        n=BATCH_SIZE,
        optimization_config=experiment.optimization_config,
        model=Models.BOTORCH_MODULAR,
        generation_strategy=generation_strategy,
    )

    trial = experiment.new_batch_trial(generator_run=generator_run)
    for arm in trial.arms:
        arm._parameters = {k: 0.0 if v < 1e-3 else v for k, v in arm.parameters.items()}
        assert sum([v > 1e-3 for v in arm.parameters.values()]) <= 3
    trial.run()
    data = Data.from_multiple_data([data, trial.fetch_data()])

    new_value = trial.fetch_data().df["mean"].min()
    print(
        f"Iteration: {i}, Best in iteration {new_value:.3f}, Best so far: {data.df['mean'].min():.3f}"
    )

Caveat: This is spaghetti code I've got lying around, so I cannot vouch for its correctness. But maybe something you can try out until Mia comes back with the correct answer :)

cheeseheist commented 2 months ago

Thanks @Abrikosoff! I think this still runs me into the same issue though. I need to attach all the existing data (which has broader bounds) to the experiment prior to utilizing the generation strategy. This will error out unless I make the bounds broader before I even start generating trials. If I define a new search space, it doesn't seem like GenerationStrategy.gen() has an option to pass a new search space. I'm not seeing anything in the API where I can change the search space in the experiment object (which seems to be what gets passed in to GenerationStrategy.gen(). Let me know if there is anything I'm missing there.

It really seems like there should be a place for me to pass model_kwargs={"fit_out_of_design": True} directly into the modelbridge, but I cannot figure out how to do so. I'm using ... model = ax.modelbridge.registry.Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data()) for my model bridge, and it doesn't seem like there is a relevant kwarg to pass here.

cheeseheist commented 2 months ago

My most recent attempt that actually runs is as follows:

        model = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data(),fit_out_of_design=True) 
        model.model_space.parameters["VarA"].update_range(lower=VarA_lb,upper=VarA_ub)
        model.model_space.parameters["VarB"].update_range(lower=VarB_lb,upper=VarB_ub)
        model.model_space.parameters["VarC"].update_range(lower=VarC_lb,upper=VarC_ub)
        model.model_space.parameters["VarD"].update_range(lower=VarD_lb,upper=VarD_ub)

        generator_run = model.gen(n=BATCH_SIZE,search_space=new_search_space)

It gives no errors or warnings, but also doesn't actually use the new bounds at all. It seems to just use the old bounds. Perhaps the fit_out_of_design=True does not behave as expected? My hope was that it wouldn't simply clamp to the bounds, but would actually intelligently determine a suggestion within the new bounds. It seems like the acquisition function is still not privy to the updated parameter space. I've tried with and without defining the new search space in the model.gen() and with and without the .update_range() commands and as long as I use one or the other or both, it behaves the same.

mgarrard commented 2 months ago

Hi @cheeseheist, is there a reason you are using the developer API instead of the service API? We strongly recommend using the service API as it handles most of the complexity for you in an elegant manner

cheeseheist commented 2 months ago

@mgarrard - well it has been a while since I chose a path, but I believe when I was looking at things there was something I didn't think I could do within the Service API, but I could revisit. But will it actually work in Service API as I'm hoping? I'm a little hesitant to do a bunch of redevelopment when it seems like fit_out_of_design=True is successfully getting passed and isn't behaving as I'd hoped.

Abrikosoff commented 2 months ago

My most recent attempt that actually runs is as follows:
        model = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data(),fit_out_of_design=True) 
        model.model_space.parameters["VarA"].update_range(lower=VarA_lb,upper=VarA_ub)
        model.model_space.parameters["VarB"].update_range(lower=VarB_lb,upper=VarB_ub)
        model.model_space.parameters["VarC"].update_range(lower=VarC_lb,upper=VarC_ub)
        model.model_space.parameters["VarD"].update_range(lower=VarD_lb,upper=VarD_ub)

        generator_run = model.gen(n=BATCH_SIZE,search_space=new_search_space) 
It gives no errors or warnings, but also doesn't actually use the new bounds at all. It seems to just use the old bounds. Perhaps the fit_out_of_design=True does not behave as expected? My hope was that it wouldn't simply clamp to the bounds, but would actually intelligently determine a suggestion within the new bounds. It seems like the acquisition function is still not privy to the updated parameter space. I've tried with and without defining the new search space in the model.gen() and with and without the .update_range() commands and as long as I use one or the other or both, it behaves the same.

I think this is because the kwargs is not actually being passed to the model in this form; easiest would be to take an example notebook from the website which runs the Service and try doing this using a conventional GS (which we know to work), and see how that goes for your use case.

cheeseheist commented 2 months ago

I will try to tinker around with the Service API, but it does seem to be getting passed. It does make a difference, just not the difference I want it to make :).

Without passing fit_out_of_design=True, the optimization respects the new search space, but does so by artificially just clamping to the bound anytime it is out of bounds rather than actually evaluating the acquisition function within the new bounds from the start.

After passing fit_out_of_design=True, the optimization just uses the original bounds and completely ignores the new bounds that were passed. Perhaps it needs passed into something else that would be automatically taken care of in the Service API, but isn't being done here. So, I'll tinker with that, but my fear is it will just do the same thing.

mgarrard commented 2 months ago

@cheeseheist we are currently working on developing improvements to search space modifications in experiments, but easy access to those is probably a couple months away, and full API level support for them even further. For your usecase demonstrated above you'd probably want the code to look something like:

optimization_config = #TODO
data = exp.fetch_data()
model = Models.BOTORCH_MODULAR(experiment=exp, data=data, fit_out_of_design=False, search_space=new_search_space)
generator_run = model.gen(n=BATCH_SIZE, optimization_config=optimization_config)

However, the above is a bit out of scope of our supported methods, and I would recommend one of the two options outlined above. I think it would be see an example of your point of "the optimization just uses the original bounds and completely ignores the new bounds that were passed" -- using the full data instead of only data within the search space is the intended functionality of fit_out_of_design = True.

Also @Abrikosoff's suggestion to use a simple problem to debug is good.

My suggestion about the service API is a general suggestion, as the service API is the API we can provide the highest level of support for, and works best for almost all users for almost all use cases.

cheeseheist commented 2 months ago

Thanks for the continued support @mgarrard.

So, I took a step back and tried the Service API with hartmann6. Essentially what I tried to do was run it for 10 iterations with the standard bounds, then change to a new search space (allowed by putting the flag for immutable search space and opt config false), then run more. It doesn't really do anything. I verified that the new search space is registered in the experiment object, but it just ignores it whether I have fit_out_of_design set to True or False. Perhaps altering the search space is not supported in the ax_client and even though I did it, it doesn't actually do anything? See my full code below for this implementation.

I also tried setting the initial search space more restrictive and trying to attach trials that would be outside those more restrictive bounds and it gives me an error saying that the trial is outside the bounds whether I set fit_out_of_design=True or not.

I'm not as familiar with working with the ax_client, so let me know if I'm doing something silly.

from ax.service.ax_client import AxClient, ObjectiveProperties
from ax.utils.measurement.synthetic_functions import hartmann6

from ax.modelbridge.generation_strategy import GenerationStrategy, GenerationStep
from ax.modelbridge.registry import Models

import numpy as np

gs = GenerationStrategy(
    steps=[
        # 1. Initialization step (does not require pre-existing data and is well-suited for 
        # initial sampling of the search space)
        GenerationStep(
            model=Models.SOBOL,
            num_trials=5,  # How many trials should be produced from this generation step
            min_trials_observed=5, # How many trials need to be completed to move to next model
        ),
        # 2. Bayesian optimization step (requires data obtained from previous phase and learns
        # from all data available at the time of each new candidate generation call)
        GenerationStep(
            model=Models.BOTORCH_MODULAR,
            num_trials=-1,  # No limitation on how many trials should be produced from this step
            model_kwargs={"fit_out_of_design": True},
            max_parallelism=1,  # Parallelism limit for this step, often lower than for Sobol
            # More on parallelism vs. required samples in BayesOpt:
            # https://ax.dev/docs/bayesopt.html#tradeoff-between-parallelism-and-total-number-of-trials
        ),
    ]
)

ax_client=AxClient(generation_strategy=gs)

ax_client.create_experiment(
    name="hartmann_test_experiment",
    parameters=[
        {
            "name": "x1",
            "type": "range",
            "bounds": [0.0, 1.0],
            "value_type": "float",  # Optional, defaults to inference from type of "bounds".
            "log_scale": False,  # Optional, defaults to False.
        },
        {
            "name": "x2",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "x3",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "x4",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "x5",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "x6",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    objectives={"hartmann6": ObjectiveProperties(minimize=True)},
    immutable_search_space_and_opt_config=False,
)

def evaluate(parameterization):
    x = np.array([parameterization.get(f"x{i+1}") for i in range(6)])
    # In our case, standard error is 0, since we are computing a synthetic function.
    return {"hartmann6": (hartmann6(x), 0.0)}

for i in range(10):
    parameterization, trial_index = ax_client.get_next_trial()
    # Local evaluation here can be replaced with deployment to external system.
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluate(parameterization))

ax_client.set_search_space(parameters=[
        {
            "name": "x1",
            "type": "range",
            "bounds": [0.1, 0.3],
            "value_type": "float",  # Optional, defaults to inference from type of "bounds".
            "log_scale": False,  # Optional, defaults to False.
        },
        {
            "name": "x2",
            "type": "range",
            "bounds": [0.9, 1.0],
        },
        {
            "name": "x3",
            "type": "range",
            "bounds": [0.8, 1.0],
        },
        {
            "name": "x4",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "x5",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "x6",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],)

for i in range(10):
    parameterization, trial_index = ax_client.get_next_trial()
    # Local evaluation here can be replaced with deployment to external system.
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluate(parameterization))

cheeseheist commented 2 months ago

Update - got everything working!

I read through #768 again more carefully and found that there were some 'hidden' messages in the thread that had gotten collapsed for some reason and they held the solution. Using attach_trial does not work even if you pass fit_out_of_bounds=True. Instead, you must use the following workflow (and also pass fit_out_of_bounds=True to the generation strategy or modelbridge).

trial = ax_client.experiment.new_trial()
trial.add_arm(Arm(parameters={"x1":..., "x2":..., ...}))
trial.mark_running(no_runner_required=True)

Alternatively, when using the Developer API, I did...

trial = exp.new_trial()
trial.add_arm(Arm(parameters=parameterization))
trial.run()
trial.mark_completed()
exp.attach_data(Data(df=results))

So now I start with the more restrictive search space, but have input data that is outside of it. I'm able to attach the input data using the code above instead of using attach_trial and am able to fit the out of design data by passing fit_out_of_design via either the generation strategy or to the modelbridge directly (below).

via generation strategy in Generation Step

GenerationStep(
            model=Models.BOTORCH_MODULAR,
            num_trials=-1,  # No limitation on how many trials should be produced from this step
            model_kwargs={"fit_out_of_design": True},
            max_parallelism=1, 
            )

directly to modelbridge

model = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data(),fit_out_of_design=True)

Both appear to work as desired. I can't definitely say at this point that the out of bounds data is actually being used, but I expect that it is. There is no more clamping to the bounds and the suggestions are now within the more restrictive bounds. I do now get a warning that the input data is outside the unit cube and I should use min/max scaling. I assume there is another flag somewhere that allows me to specify min/max scaling but haven't worked on finding that yet. Thanks for the support!

cheeseheist commented 2 months ago

Side note @mgarrard if you have any tips on addressing the warning below or what issues that may cause, let me know. Seems like it was also mentioned on #768 as a side note, but I didn't see any resolution there. Doesn't seem like there is an obvious way to resolve this. I can open a separate issue if that'd be preferred. It runs, but not sure if this will cause some instability or issues with different constraints down the road.

InputDataWarning: Input data is not contained to the unit cube. Please consider min-max scaling the input data. warnings.warn(msg, InputDataWarning)

Abrikosoff commented 2 months ago

Hi @cheeseheist , thanks for coming back with a working solution! So it seems that the solution is to do add_arm rather than attach_trials? Also in the Service version you posted above, once you add arms, I presume you would also need to trial.run() and trial.mark_completed(), no? (these seem to be there in the Developer version of your code)

cheeseheist commented 2 months ago

@Abrikosoff, correct. The add_arm approach combined with passing fit_out_of_design is the ticket. That allows you to attach data that is outside your bounds.

I have never used this approach until this code, but from the example I found on #768, they used trial.mark_running(no_runner_required=True) instead and that seemed to work within the ax_client. But yeah, I think you could alternative do trial.run() and trial.completed().

Abrikosoff commented 2 months ago

Given the original motivation for your question, I think an interesting thing to do would be to compare the case with adaptive (term used loosely) search spaces and with a simple narrow search space on various performance metrices, like convergence or somesuch, just to see how adaptivity affects the results. But that would be beyond the purview of this forum :-)

ha7ilm commented 2 weeks ago

@cheeseheist , thanks for sharing all that.

I am trying to adjust the search space on the fly with the service API. I also encountered that update_range does not do anything, so I hope to find a solution in this thread. However, some things are not completely clear for me.

Could you expand on how do you query for a new parameter set from BO? To my understanding,

trial = ax_client.experiment.new_trial()
trial.add_arm(Arm(parameters={"x1":..., "x2":..., ...}))
trial.mark_running(no_runner_required=True)

will allow me to add an existing trial that was done before, but how to get then the new parameter set for x1, x2... that is respecting the bounds?

Could you post the whole code example with the Service API, or just how this part...

for i in range(10):
    parameterization, trial_index = ax_client.get_next_trial()
    # Local evaluation here can be replaced with deployment to external system.
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluate(parameterization))

...has been fixed?