Open ekalosak opened 4 years ago
For those encountering the same issue, here's a functional work-around:
workaround_results = [UserFunctionResult(X=x, Y=y) for x, y in zip(X2,Y2)]
workaround_loop_state = LoopState(workaround_results)
control_loop.loop_state = workaround_loop_state
X_next = control_loop.get_next_points(
results = results,
)
# When the workaround is applied, model data are maintained as expected
assert model_emukit.model.X.shape[0] == X2.shape[0] + 1
for model_x_row, replacement_x_row in zip(model_emukit.model.X, X2):
assert all(model_x_row == replacement_x_row)
Hi @ekalosak , yes, the model is updated with the results that are stored in the loop state, at least if you use the appropriate model updater. Hence your workaround works in that case. For another (custom) model updater that does not use the loop state object it may not work.
I am curious, where do you need that functionality? I.e., that you replace the data of the model. I am asking because the active learning loop is usually used precisely to collect the data. if you replace it in the middle, you could have just started with that other data.
To address your curiosity: consider an experiment in which we have imprecise knowledge about the allowable discrete elements of the objective function's domain. What's more, we don't get the precise point in the domain associated with a particular experiment until some time after the primary experiment is performed.
An example might be a combinatorial material science application where certain a priori unknown configurations of material properties are impossible to fabricate, some desired properties are only possible approximately, etc. Our goal is to improve conductive properties, e.g., and the conductivity is easy to test so we get our measurements post-fabrication quickly. However, measuring the actual fabricated properties is difficult, takes time, and comes in batches because we sent samples in batches to an external lab. Note that these material properties are part of the design space, not the objective functions's co-domain.
It might be attractive to suggest multi-fidelity optimization, but doubling the number of free parameters seems problematic when we're shooting for sample-efficiency.
tl;dr the X data are generated with incomplete information about the allowable domain, so it's useful to be able to adjust the model data as we go when more precise information about the actually implemented X data becomes available.
To not get too off-track: does it make sense to have the model.set_data()
be the definitive source of data? If so, I'm thinking of adding the loop_state as a paramz observer that updates when set_data is called... But I'm really not terribly sure what the best design is here, so any input would be appreciated.
Or, perhaps, the best design is just to isolate the data modification to the loop_state results and let the control_loop.get_next_points() do the model.set_data() call.
Hi Emikit team,
First, thank you for your work on this package - it's a joy to use.
I'm writing with a question about some curious behavior I've observed when using the Bayesian optimization control loop. When I use the
IModel.set_data(X, Y)
class method to alter the model data followed by theOuterLoop.get_next_points(results)
, the model's data is reset to what it was before theset_data()
call with an extra row representing the contents of theresults
object.The expected behavior is to see, after the
OuterLoop.get_next_points(results)
call, the model data constituted by theX
passed toset_data
concatenated with the contents ofresults
.Here's a minimal example that reproduces the behavior: