Closed sgbaird closed 4 months ago
Here is my workaround:
parameter_names = [p.name for p in searchspace.parameters]
obj_name = "Target"
full_lookup = pd.concat([lookup_training_task, lookup_test_task], ignore_index=True)
df = campaign.recommend(batch_size=BATCH_SIZE)
# update the dataframe with the target value(s)
df = pd.merge(
df, full_lookup[parameter_names + [obj_name]], on=parameter_names, how="left"
)
Hi @sgbaird, I think it could be that you are mixing up unrelated things here. Note: you lookup_test_task
merely holds the lookup data to close the DOE loop – it exists completely independently of your campaign and does not enter it in any way. They are only connected in the sense that you can use it to look up values that your campaign recommends, but the campaign isn't even aware that this objects exists and could in fact do its work without it. Thus, it's impossible that the recommendations of the campaign can have anything in common with the lookup, e.g. there is no reason to even assume that they would share indices or similar.
The indices you see returned by the campaign refer to the dataframe that is internally created to represent the discrete search space of the problem. But that is a completely arbitrary choice. In fact, I would even argue that the indices could be ignored entirely. We simply used the search space indices because we have to use an index for pandas DataFrames, and this at least gives us a reference to which search space elements have been recommended (compared to the alternative were we would simply start enumerating from 1).
Does it answer your question?
Ah, got it. So my "workaround" above is actually the correct way to do it - look for a matching configuration.
I guess part of the confusion is that the lookup tables were created based on the allowed search space values, but I see that is an arbitrary choice for this example.
Let's take the case where the training data is sampled within a continuous search space, and there isn't a particular pattern to the parameter sets. As a concrete example for a 1D problem.
training_data = [{"x": 0.43, "y": 1.86}, ... {"x": 0.78, "y": 2.3}]
and we say that the test function can only be sampled at x = [0, 0.5, 1.0]
, what is the correct way to set this up with BayBE's API?
import pandas as pd
from baybe import Campaign
from baybe.parameters import NumericalContinuousParameter, NumericalDiscreteParameter
from baybe.recommenders import BotorchRecommender
from baybe.searchspace import SearchSpace
from baybe.targets import NumericalTarget
parameters = [
NumericalDiscreteParameter("x", [0.0, 0.5, 1.0]),
NumericalContinuousParameter("y", (0, 1)),
]
searchspace = SearchSpace.from_product(parameters)
objective = NumericalTarget("t", mode="MAX").to_objective()
recommender = BotorchRecommender()
measurements = pd.DataFrame.from_records(
[
{"x": 0.43, "y": 0.55, "t": 1.86},
{"x": 0.78, "y": 0.98, "t": 2.3},
]
)
rec = recommender.recommend(5, searchspace, objective, measurements)
print(rec)
campaign = Campaign(searchspace, objective, recommender)
campaign.add_measurements(measurements, numerical_measurements_must_be_within_tolerance=False)
rec = campaign.recommend(5)
print(rec)
Note: For the latter, you currently need to explicitly specify the numerical_measurements_must_be_within_tolerance
flag, since your measurements strictly speaking lie outside the range of the parameter you specified. However, we are currently still working on this interface, though, exactly because the behavior is not yet perfectly consistent between the two approaches and because the "tolerance" logic needs to be revised in general. #workinprogress
Hi @sgbaird, note that I've just updated the imports in my code above (which were a bit messy in the original version). Other than that, everything remains the same. I'll close the issue now, but feel free to reopen if further discussion is necessary ✌🏼
I'm guessing for a multi-task problem, BayBE concatenates the
df
such that some of the original indices are lost. See the reproducer below, which gives a recommendeddf
with an index of40
, even though the highest index forlookup_test_task
is26
. Note thatSMOKE_TEST
environment variable was set totrue
when I ran this. Is this intended behavior?If I dig further into the stack trace of
simulate_scenarios
, I'm guessing I'd find some handling of that, but it wasn't immediately obvious to me.xref: https://github.com/emdgroup/baybe/discussions/257 and https://github.com/emdgroup/baybe/discussions/283
NOTE:
baybe==0.9.1.post209
, Windows 11