Closed ramseyissa closed 8 months ago
So this sounds like an active learning type problem where your goal is to find the best model across the whole design space ("best" in some yet-to-be specified sense)? This would mean that you'd need to use an acquisition function that specifically attempts to do this - standard acquisition functions for optimization such as Expected Improvement won't do the job here. There is e.g. [qNegIntegratedPosteriorVariance
] (https://github.com/pytorch/botorch/blob/main/botorch/acquisition/active_learning.py#L40) that could be used to that end. I am not sure if we have any examples for how to use it in ax, but it should be possible to do this via the modular BoTorch model setup: https://github.com/facebook/Ax/blob/main/tutorials/modular_botax.ipynb
Once that's hooked up, then you should be able to use evaluate_acquisition_function
to evaluate the acquisition function on your held-out points to pick the one with the highest value. See also the modular BoTorch model tutorial for how to do this (it's not actually run in code but discussed in the comments there).
@Balandat Thank you very much for the quick reply!
Yes exactly, this is an active learning problem. My main goal is to try to minimize the amount of data (of the 4000 samples) used to train a model, that would then be able to give me an accurate (say 80-90%) prediction on a validation set and eventually on a held-out test set.
Ok great I will take a look at the references.
So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.
So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.
So in general you'll need some kind of model with uncertainty quantification if you want to use qNegIntegratedPosteriorVariance
or similar acquisition functions that are based on minimizing global model uncertainty. A GP would be the obvious choice here (and should work out of the box, unlike some other models which you'd have to hook up).
Ok that makes sense. So to validate this i would just fit and predict using the GP on the validation set iteratively after selecting the next best point (as in the sense that it would be my evaluation function)?
yup, that makes sense to me!
@Balandat Hello again,
Thank you very much for the initial advice. I have a semi-working implementation (i say semi for the following reason).
I notice when i evaluate the acquisition function qNegIntegratedPosteriorVariance
, and print print(best_candidate_idx)
i always get an output of tensor(0) for the index. But when i do this with UpperConfidenceBound
the indices printed are random (which makes a bit more sense to me). Can you shed some light on this?
for i in range(len(X_candidates)):
# Fit the GP
gp = SingleTaskGP(X_train, Y_train)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)
# acquisition function
acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates)
# acq_func = UpperConfidenceBound(gp, beta=2.0)
# acq_values = acq_func(X_candidates.unsqueeze(-2))
# Evaluate the acquisition function on the candidate set
acq_values = acq_func(X_candidates)
# Find the index of the best candidate in the candidate set
best_candidate_idx = torch.argmax(acq_values)
print(best_candidate_idx)
# add the best candidate to the training data
X_train = torch.cat([X_train, X_candidates[best_candidate_idx].unsqueeze(0)])
Y_train = torch.cat([Y_train, Y_candidates[best_candidate_idx].unsqueeze(0)])
# remove the best candidate from candidates
X_candidates = torch.cat([X_candidates[:best_candidate_idx], X_candidates[best_candidate_idx+1:]])
Y_candidates = torch.cat([Y_candidates[:best_candidate_idx], Y_candidates[best_candidate_idx+1:]])
# Predict MAE
test_posterior = gp.posterior(X_test)
test_predictions = test_posterior.mean.detach().numpy()
mae = mean_absolute_error(Y_test, test_predictions)
Thank you in advance!
Hmm I think you may be using qNegIntegratedPosteriorVariance
incorrectly here. Note that the mc_points
arguments takes a (typically relatively dense) set of points across the domain that are used for Monte Carlo integration. But by doing acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates)
you're passing the candidate set. This results in the acquisition function approximating the integral of the posterior variance by only summing it up on at the candidate points (i.e. incredibly poorly). You should use a different set of points for MC integration instead.
Ah, that makes sense.
I updated my script based on your suggestion. I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points. The code below is inserted into the forloop under fit_gpytorch_model(mll)
. Unfortunately, the print statement for the best_index is still printing tensor(0). Was this approach along the right path that you mentioned?
mc_points_size = 500
mc_points_indices = np.random.choice(len(X_candidates), min(mc_points_size, len(X_candidates)), replace=False)
mc_points = X_candidates[mc_points_indices]
Thank you again for taking the time to respond to the numerous questions!
I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points.
Oh so this means that you're ever only going to evaluate the posterior variance on a subset of the data points. The mc_points
are meant to be points that have nothing to do with the candidates - they are just a set of points that are used for Monte-Carlo integration. If you take a look at eq. (2) in https://arxiv.org/pdf/1710.03206.pdf, the approach qNegIntegratedPosteriorVariance
takes is to simply Monte-Carlo integrate the integrand over the domain. The mc_points
are the samples from the domain to do this integration, they should be uniformly sampled from the domain (the more the better for accuracy, ofc subject to compute and memory limits). So if your domain is [0,1]^d
, this would be something like torch.rand(num_mc_samples, d)
.
Hey @Balandat, thanks for clarifying what the mc_points
are suppose to be. I sort of went down the Botorch rabbit hole these last few days and I am starting to get a little more familiar with it 😅 . The first comment you mentioned with using the evaluate_acquisition_function
makes a bit more sense now (going to tackle that next). I opted to use a different acquisition function with the snippet of code i posted earlier and just wanted to run this by you.
bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)
candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)
candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
iteration = []
mae_list = []
for i in range(100):
# Fit the Gaussian Process model
gp = SingleTaskGP(X_train, Y_train)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)
qMES = qMaxValueEntropy(gp,candidate_set)
acq_values = qMES(X_candidates.unsqueeze(-2))
best_candidate_idx = torch.argmax(acq_values)
print(best_candidate_idx)
My concern here was just to make sure this is being used correctly ..i defined the acquisition function and used candidate_set
drawn from draw_sobol_samples
and then used qMES
on the samples i am using for active learning (X_candidates).. acq_values = qMES(X_candidates.unsqueeze(-2))
... after that i pull the max
best_candidate_idx = torch.argmax(acq_values)
and then added this point into the training set.
Hope this makes sense!
So here you're effectively optimizing the MES acquisition function on a finite set of random candidate points. That's not wrong per se, but one of the big benefits of the BoTorch acquisition functions is that they are differentiale and so you can use gradient-based optimizers to typically get much better solutions, esp. jn high dim spaces. Check out optimize_acqf
for how to do this.
EDIT: Just realized that my assumption is incorrect. It seems like the best way to go is evaluate_acquisition_function
as it would allow for evaluating on my actual candidate list. Here optimize_acqf
is not looking strictly which value in X_candidates optimizes the acquisition function.
ooh i see! I did some digging based on your last comment and found this notebook you contributed to max_value_entropy. Correct me if i am wrong here, the candidate_set
in the notebook, would be my X_candidates
that i am looking to evaluate..? From the code below this means the returned candidates
and acq_value
would be the next best candidate to select and train the gp on. Initially in the code above (in the last comment) i treated the candidate_set
as we did the mc_points
which maybe was not the right approach.
bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
# candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)
# # candidate_set = torch.rand(1000, bounds.size(1), device=bounds.device, dtype=bounds.dtype)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
iteration = []
mae_list = []
for i in range(len(X_candidates)):
# Fit the Gaussian Process model
gp = SingleTaskGP(X_train, Y_train)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)
qMES = qMaxValueEntropy(gp,X_candidates)
candidates, acq_value = optimize_acqf(
acq_function=qMES,
bounds=bounds,
q=1,
num_restarts=10,
raw_samples=512,
)
Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..?
Not quite. The qMaxValueEntropy
acquisition function, similar to qNegIntegratedPosteriorVariance
also requires a discrete set of points candidate_set
in the domain over which to draw samples of the max value (this is just due to how that acquisition function is implemented). Bur this is (maybe confusingly) not the same as X_candidate
which would be the set of points from which you consider to pick one to evaluate next. In this case you can. Still optimize the acquisition function (for a given candidate_set
numerically without having to pre specify a set of discrete points from which to choose.
i treated the candidate_set as we did the mc_points which maybe was not the right approach.
that is the right approach.
Hey @Balandat, its been a little while, but i reevaluated my approach a bit (hopefully this is closer to what you suggested early on). See sample code please.
I have a few questions here:
In this use case, do i need to optimize the acquisition function as in the previous use case (in the above example that we previously spoken about)? I ask because oddly enough my mean absolute error fluctuated a bit and did not decrease as expected (so i assume i am making a mistake somewhere).
One issue i encountered was when i was trying to swap out qMaxValueEntropy
for qNegIntegratedPosteriorVariance
. For some reason a direct swap in this code raised an error. Here is the error raised.
RuntimeError: Input constructor for acquisition class qNegIntegratedPosteriorVariance not registered. Use the @acqf_input_constructor` decorator to register a new method.
Can this acquisition function be added to evaluate_acquisition_function
so that it is compatible? (maybe i should be opening up a separate issue for this).
Here is the sample code:
gs = GenerationStrategy(
steps=[
GenerationStep(
model=Models.BOTORCH_MODULAR,
num_trials=-1,
max_parallelism=3,
model_kwargs={"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qMaxValueEntropy,
}
),
]
)
test_set_dict = [ObservationFeatures(x) for x in X_test.to_dict(orient="records")]
ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
name="Peak_Prediction",
parameters=parameters,
objective_name="ppm",
# minimize=False,
)
# Attach and complete trials for the initial training set
for j in range(len(X_train)):
_, trial_index = ax_client.attach_trial(X_train.iloc[j, :].to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=Y_train.iloc[j].to_dict()
)
print("fitted Training data")
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(X_candidates)
for i in range(len_X_candidates):
ax_client.fit_model()
model_bridge = ax_client.generation_strategy.model
observation_features = [
ObservationFeatures(x) for x in X_candidates.to_dict(orient="records")
]
acqf_values = model_bridge.evaluate_acquisition_function(
observation_features=observation_features)
max_value = max(acqf_values)
max_index = acqf_values.index(max_value)
next_experiment = X_candidates.iloc[max_index]
Y_candidate_value = Y_candidates.iloc[max_index]
# Add the features with the max value to X_train and remove from X_candidates
X_train = pd.concat([X_train, next_experiment.to_frame().T], ignore_index=True)
X_candidates = X_candidates.drop(X_candidates.index[max_index], axis=0)
# Attach and complete the next trial
_, trial_index = ax_client.attach_trial(next_experiment.to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=Y_candidate_value.to_dict()
)
trials = trials + 1
model = ax_client.generation_strategy.model
y_predicted = model.predict(test_set_dict)
predicted_y_values = y_predicted[0]['ppm']
mae = mean_absolute_error(Y_test, predicted_y_values)
i = i + 1
iteration.append(i)
mae_list.append(mae)
Thank you for your time!
@ramseyissa, maybe the following will be of help?
Note: copied from https://github.com/facebook/Ax/issues/1312
Here is a self-contained example using vanilla single-objective optimization based on https://github.com/facebook/Ax/issues/460#issuecomment-975233349:
from typing import Any, Dict, Optional
import torch
# from ax.core.objective import ScalarizedObjective
from ax.modelbridge import get_sobol
from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from ax.modelbridge.registry import Models
from ax.models.torch.botorch_modular.surrogate import Surrogate
from ax.service.ax_client import AxClient
from botorch.acquisition.active_learning import (
MCSampler,
qNegIntegratedPosteriorVariance,
)
from botorch.acquisition.input_constructors import (
MaybeDict,
acqf_input_constructor,
construct_inputs_mc_base,
)
from botorch.acquisition.objective import AcquisitionObjective
from botorch.models.gp_regression import SingleTaskGP
from botorch.models.model import Model
from botorch.utils.datasets import SupervisedDataset
from torch import Tensor
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
model: Model,
mc_points: Tensor,
training_data: MaybeDict[SupervisedDataset],
objective: Optional[AcquisitionObjective] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if model.num_outputs == 1:
objective = None
base_inputs = construct_inputs_mc_base(
model=model,
training_data=training_data,
sampler=sampler,
X_pending=X_pending,
objective=objective,
)
return {**base_inputs, "mc_points": mc_points}
def objective_function(x):
f = x["x1"] ** 2 + x["x2"] ** 2 + x["x3"] ** 2
return {"f": (f, None)}
parameters = [
{"name": "x1", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
{"name": "x2", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
{"name": "x3", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
]
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)
model_kwargs_val = {
"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qNegIntegratedPosteriorVariance,
"acquisition_options": {"mc_points": mcp},
}
gs = GenerationStrategy(
steps=[
GenerationStep(model=Models.SOBOL, num_trials=num_sobol),
GenerationStep(
model=Models.BOTORCH_MODULAR, num_trials=num_qnipv, model_kwargs=model_kwargs_val
),
]
)
ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
name="active_learning_experiment",
parameters=parameters,
objective_name="f",
minimize=True,
)
for _ in range(20):
trial_params, trial_index = ax_client.get_next_trial()
data = objective_function(trial_params)
ax_client.complete_trial(trial_index=trial_index, raw_data=data["f"])
@sgbaird, thank you very much, this was very helpful! I managed to tweak this a bit for my use case, and swapped out the qNIPV acquisition function for qMES using its own input constructor. I have a few question that you might be able to address here and if @Balandat has some input on this that would also be appreciated.
Ok so the question is, when calling evaluate_acquisition_function(observation_features=observation_features)
, observation features here is a dictionary of my candidate list that i am looking to pick the next sample from, do i need to call optimize_acqf
function at some point before using evaluate_acquisition_function
or is this automatically handled here?
Thank you both for your time!
This is the input constructor used for qMES for reference:
@acqf_input_constructor(qMaxValueEntropy)
def construct_inputs_qMES(
model: Model,
training_data: MaybeDict[SupervisedDataset],
bounds: List[Tuple[float, float]],
objective: Optional[MCAcquisitionObjective] = None,
# posterior_transform: Optional[PosteriorTransform] = None,
candidate_size: int = 1000,
**kwargs: Any,
) -> Dict[str, Any]:
r"""Construct kwargs for `qMaxValueEntropy` constructor."""
inputs_mc = construct_inputs_mc_base(
model=model,
training_data=training_data,
objective=objective,
**kwargs,
)
X = _get_dataset_field(training_data, "X", first_only=True)
_kw = {"device": X.device, "dtype": X.dtype}
_rvs = torch.rand(candidate_size, len(bounds), **_kw)
_bounds = torch.tensor(bounds, **_kw).transpose(0, 1)
return {
**inputs_mc,
"candidate_set": _bounds[0] + (_bounds[1] - _bounds[0]) * _rvs,
"maximize": kwargs.get("maximize", True),
}
I can't really comment on the accuracy of the construct_inputs_qMES
function (other than, if it doesn't throw errors, that's a good sign 😉) . I don't think you need to deal with optimize_acqf
if you're using evaluate_acquisition_function
.
Indeed, you're evaluating the acquisition function on a discrete set here using evaluate_acquisition_function
, so you can just pick the best and not use optimize_acqf
.
Recapping some internal discussion, @ramseyissa is predicting peak positions for an NMR spectra, so a single set of inputs typically has multiple outputs. Depending on the number of new atoms added to each simulation, there could be anywhere from 1 to 16 peaks. I think the possible locations of the peak are on the order of 100's, if not continuous, and 16 outputs might be too much in terms of modeling output behavior.
One option would be to train a VAE to move a spectrum object back and forth from a low-dimensional representation (e.g., ~4 values) and a high-dimensional version (16 values). The ppm values could be sorted to reduce degeneracies or data augmentation could be applied (swapping values).
https://github.com/facebook/Ax/issues/1312#issue-1488742013
Then, I think ScalarizedPosteriorTransform
gets used to compute a sort of joint acquisition function. Maybe this can be specified in the part of the generation strategy that gets passed more directly to BoTorch)
EDIT: @ramseyissa, also, I'm not sure if Max-value Entropy Search (MES) is the right model for your application since it still seems to be focused on minimizing or maximizing some output property that you're modeling rather than minimizing the global uncertainty, irrespective of whether the property is low or high. From https://botorch.org/tutorials/max_value_entropy:
Max-value entropy search (MES) acquisition function quantifies the information gain about the maximum of a black-box function
emphasis added
@Balandat @sgbaird :
I have been trying to use qNIPV in the same way i used MES, but i keep getting an error that isn't making much sense to me. In my problem just to remind you, i am not looking to optimize my objective value, i am just looking to minimize my global uncertainty for a predefined list of candidates.
It seems that when using the construct_inputs_qNIPV
it is not recognizing mc_points
that is leading to the error (the error is seen at the bottom). Any suggestions on how to address this?
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
model: Model,
mc_points: Tensor,
training_data: MaybeDict[SupervisedDataset],
objective: Optional[AcquisitionObjective] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if model.num_outputs == 1:
objective = None
base_inputs = construct_inputs_mc_base(
model=model,
training_data=training_data,
sampler=sampler,
X_pending=X_pending,
objective=objective,
)
return {**base_inputs, "mc_points": mc_points}
# the objective function here is just predicting on a test set and calculating the mae
# with the model that is updated through an active learning loop
def objective_function(model, test_set_dict, test_set_target_vales):
# get gp
model = ax_client.generation_strategy.model
# predict on test set
y_predicted = model.predict(test_set_dict)
# get predicted values
predicted_y_values = y_predicted[0]['ppm']
# calculate mae
mae = mean_absolute_error(test_set_target_vals, predicted_y_values)
print(f'MAE: {mae}')
return {"mae": (mae, None)}
init_len_of_x_train = len(train_dfs)
parameters = [
{"name": vals, "type": "range", "bounds": [-15.0, 15.0], "value_type": "float"}
for vals in descriptors_df.columns]
# this is used to get the mc_points for the acquisition function
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)
# dict of kwargs for BoTorch modular in generation strategy
model_kwargs_val = {
"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qNegIntegratedPosteriorVariance,
"acquisition_options": {"mc_points": mcp},
}
gs = GenerationStrategy(
steps=[
# GenerationStep(model=Models.SOBOL, num_trials=5),
GenerationStep(
model=Models.BOTORCH_MODULAR, num_trials=-1
, model_kwargs=model_kwargs_val
),
]
)
ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
name="active_learning_experiment",
parameters=parameters,
objective_name="ppm",
# minimize=True,
)
# test set
test_set_dict = [ObservationFeatures(x) for df in test_dfs for x in df.to_dict(orient="records")]
#get test set target values
test_set_target_vals = get_target_values(test_dfs, ppm_df, 'ppm')
#get train set target values
train_set_target_vals = get_target_values(train_dfs, ppm_df, 'ppm')
# my training data is a list of dataframes, so I need to attach each dataframe to ax_client
for df in train_dfs:
for idx in range(len(df)):
_, trial_index = ax_client.attach_trial(df.iloc[idx, :].to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=train_set_target_vals.iloc[idx].to_dict()
)
index_of_candidate_set = []
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(x_candidates)
for i in range(len_X_candidates):
print("size of X_candidates",len(x_candidates))
print("size of X_train",len(train_dfs))
# fit gp
ax_client.fit_model()
# get gp
model_bridge = ax_client.generation_strategy.model
acqf_values = sum_acqf_values(x_candidates, model_bridge)
#get max of acqf values
max_value = max(acqf_values)
#get index of max value
max_index = acqf_values.index(max_value)
# locate features in orginal candidate set
next_experiment = x_candidates[max_index]
next_exp_target_vals = get_target_values(next_experiment, ppm_df, 'ppm')
# add the best candidate selected by acquisition fx experiment to training set
train_dfs.append(next_experiment)
#drop the best candidate selected by acquisition fx experiment from candidate set
x_candidates.pop(max_index)
# attach this new trail to ax_client, and complete it with the target value "ppm"
for idx in range(len(next_experiment)):
_, trial_index = ax_client.attach_trial(next_experiment.iloc[idx, :].to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=next_exp_target_vals.iloc[idx].to_dict())
trials = trials + 1
model = ax_client.generation_strategy.model
mae_vals = objective_function(model, test_set_dict, test_set_target_vals)
i = i + 1
iteration.append(i)
print(f"Iteration {i+1}: MAE = {mae_vals:.4f}")
mae_list.append(mae_vals)
index_of_candidate_set.append(max_index)
exp_df = exp_to_df(ax_client.experiment)
num_of_zeros = len(train_dfs)
exp_df['MAE'] = [0] * init_len_of_x_train + mae_list
# exp_df
The resulting error from the above script is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], line 193
190 # get gp
191 model_bridge = ax_client.generation_strategy.model
--> 193 acqf_values = sum_acqf_values(x_candidates, model_bridge)
194 # print("len of acqf_values",len(acqf_values))
195
196 # turn list of candidates into dict of observation features
(...)
205
206 # #get max of acqf values
207 max_value = max(acqf_values)
Cell In[16], line 5, in sum_acqf_values(list_of_dfs, model_bridge)
3 for df in list_of_dfs:
4 observation_features = [ObservationFeatures(x) for x in df.to_dict(orient="records")]
----> 5 acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
6 # print('acqf values', acqf_values)
7 sum_acqf_values_list.append(sum(acqf_values))
File [/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434), in TorchModelBridge.evaluate_acquisition_function(self, observation_features, search_space, optimization_config, pending_observations, fixed_features, acq_options)
425 obs_feats[i] = t.transform_observation_features(batch)
427 base_gen_args = self._get_transformed_gen_args(
428 search_space=search_space,
...
305 )
306 self.acqf = botorch_acqf_class(**acqf_inputs) # pyre-ignore [45]
307 self.X_pending: Optional[Tensor] = unique_Xs_pending
TypeError: construct_inputs_qNIPV() missing 1 required positional argument: 'mc_points'
@ramseyissa, can you provide a reproducer? i.e., something that can be copy-pasted and run standalone (preferably a link to a Google Colab notebook 😉)
I have a MWE example for qNIPV, though without anything special. Just 3 float inputs and 1 float output. If you can use this script as a template and gradually adapt it to your use-case, you should be able to pinpoint the discrepancy (and possibly resolve the error you were running into).
EDIT: this needs to be updated to account for transformations per Max's comment https://github.com/facebook/Ax/issues/1557#issuecomment-1624582689
@sgbaird,
I went through the MWE example for qNIPV that was referenced above and it was very helpful. I believe it is the same working example you referenced in https://github.com/facebook/Ax/issues/1557#issuecomment-1528840459 earlier on.
Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function
you must use the acq_options
argument when using the qNIPV acquisition function, even if specifying acquisition_options
in model_kwargs
within GenerationStrategy
, which differs from when i used the MES acquisition function (which also required MC points).
So I think all finally looks good, but maybe a final check from @Balandat would seal the deal (I can provide the reproducer @sgbaird was after :wink:)
Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).
Hmm this is interesting. I didn't go into the weeds on this but I believe this might be due to the fact that qMaxValueEntropy
has an input constructor that generating a set of candidate_size
candidate_points
. The difference to the acqf constructor for qNIPV used here is that candidate_size
has a default value, so that gets used if no candidate_size
is given in the acquisition_options
: https://github.com/pytorch/botorch/blob/6f9818756b6a3f83ad52aa3a6cd7b2591bd3ce0e/botorch/acquisition/input_constructors.py#L865
Gotcha alert: One thing to note here is that the BoTorch model when wrapped in Ax operates in the transformed space (typically the unit cube if all parameters are continuous parameters). So mc_points
here will need to be passed in that transformed space rather than the original space over which the parameters are defined. Otherwise you'll get some very funky results (e.g. say your parameters are specified on [1000, 2000]^d
, and you choose your mc_points
uniformly over the same set, then in the transformed space the model will operate on [0, 1]^d
and so most if not all of the mc_points
will not actually lie in what corresponds to the search space - in this case "very funky" would mean total garbage). Unfortunately it's at least at this point not straightforward to apply the Ax transforms automatically to the mc_points
. Hopefully we can come up with a better solution for this in the future. cc @saitcakmak, @dme65
@Balandat would it be easier to scale to [0, 1]^d
outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX
is the only transform at play, would you recommend using UnitX.transform_observation_features?
@Balandat would it be easier to scale to
[0, 1]^d
outside of Ax using sklearn's MinMaxScaler? Or, assumingUnitX
is the only transform at play, would you recommend using UnitX.transform_observation_features?
Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).
@Balandat does this look OK?
# MC Points
# WARNING: assumes only UnitX transform, https://ax.dev/docs/models.html#transforms
num_mc_sobol = 2**13
sobol = get_sobol(ax_client_tmp.experiment.search_space)
# mc_points = sobol.gen(1024).param_df.values
obs_features = [
ObservationFeatures(parameters)
for parameters in sobol.gen(num_mc_sobol).param_df.to_dict("records")
]
ux = UnitX(ax_client_tmp.experiment.search_space)
obs_features_ux = ux.transform_observation_features(obs_features)
mc_points = [list(obs.parameters.values()) for obs in obs_features_ux]
mcp = torch.tensor(mc_points)
mcp
Hmm yeah that seems like it should work! I guess since UnitX
transforms to the unit cube it should be enough to just generate the points directly without Ax involvement:
mcp = torch.quasirandom.SobolEngine(num_params, scramble=True).draw_base2(13)
Again this assumes that there aren't any more complex things going on such as choice parameters or the like..
@Balandat, thanks for the insight.
Me and @sgbaird have had some internal discussion regarding some of the items discussed above. The result of this are some questions i need clarification on.
@Balandat would it be easier to scale to
[0, 1]^d
outside of Ax using sklearn's MinMaxScaler? Or, assumingUnitX
is the only transform at play, would you recommend using UnitX.transform_observation_features?Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).
evaluate_acquisition_function
(when using the qNIPV acquisition function), should i be selecting the most negative point in the list, so the min value since it would be the most negative?
- did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set?
Pretty sure the search space (i.e., parameters
and related kwargs) can be left untouched as long as we scale mcp
appropriately (i.e., the only change that needs to occur is in transforming mcp
).
2. should i be selecting the most negative point in the list
My Intuition is to always take the maximum of the acquisition function, since acq fns are used to "... evaluate the usefulness of one of more design points ...". I'll defer to @Balandat though to clarify
Gotcha alert: One thing to note here is that the BoTorch model when wrapped in Ax operates in the transformed space (typically the unit cube if all parameters are continuous parameters). So mc_points here will need to be passed in that transformed space rather than the original space over which the parameters are defined. Otherwise you'll get some very funky results (e.g. say your parameters are specified on [1000, 2000]^d, and you choose your mc_points uniformly over the same set, then in the transformed space the model will operate on [0, 1]^d and so most if not all of the mc_points will not actually lie in what corresponds to the search space - in this case "very funky" would mean total garbage). Unfortunately it's at least at this point not straightforward to apply the Ax transforms automatically to the mc_points. Hopefully we can come up with a better solution for this in the future.
One way to get around this would be to remove UnitX
and use Normalize
instead. Unfortunately, this requires more than a few lines of code to setup the generation strategy. Here's a piece of code that sets up an MBM GenerationStep
that uses Normalize
rather than UnitX
.
Mixed_X_trans: List[Type[Transform]] = [
RemoveFixed,
OrderedChoiceEncode,
OneHot,
IntToFloat,
Log,
Logit,
]
model_kwargs: Dict[str, Any] = {
"torch_device": torch_device,
"transforms": Mixed_X_trans + Y_trans,
"transform_configs": {},
}
# Extract the search space digest to get the bounds for transforms.
tf_search_space = transform_search_space(
search_space=experiment.search_space,
transforms=[Cast] + Mixed_X_trans,
transform_configs=model_kwargs["transform_configs"],
)
search_space_digest = extract_search_space_digest(
search_space=tf_search_space,
param_names=list(tf_search_space.parameters.keys()),
)
bounds = torch.as_tensor(
search_space_digest.bounds,
dtype=torch.double,
device=torch_device,
).T
botorch_tf = Normalize(d=bounds.shape[-1], bounds=bounds)
model_kwargs["surrogate_specs"] = {
"mixed_surrogate": SurrogateSpec(
botorch_model_class=botorch_model_class,
input_transform=botorch_tf,
)
}
GenerationStep(
model=Models.BOTORCH_MODULAR,
num_trials=-1,
model_kwargs=model_kwargs,
should_deduplicate=should_deduplicate,
)
@ramseyissa would you say this is resolved?
@lena-kashtelyan,
Thank you for checking on this. I have a few questions as unfortunately my implementation is not producing the anticipated results. Here is a plot of the results. I am not quite sure why random selection is beating out my acquisition function selected data points in terms of reducing the overall MAE of a test set.
Here is a reproducer that i set up in google collab for convenience:
So just to clarify what is happening here, I have a list of dataframes, and in each of these dataframes there can be 1 to 16 points in them (1 to 16 samples.. each dataframe is called a configuration). There are about 4200 hundred total (individual) points, but 520 configurations. I train my gp on about 5 or so configurations, and then begin to have the acq f(x) select what the next best configuration to choose. This is done by evaluate_acquisition_function for each point in the configuration and taking the sum of the acq values for each configuration, as there can be 1 to 16 points within each dataframe. I then take the min value of the list of acq values, and add that configuration to my gp.
One thing that was mentioned by my lab mate @sgbaird was that i should be taking the max of the acq values list. But i was under the impression that we were looking for points with the highest posterior uncertainty and since its negative we would be looking for the min..? (i can be completely wrong here, but i think you can clear this up). A side note on this, i took the max and it performed worse, so it could be my implementation.
@saitcakmak
One thing that was mentioned by my lab mate @sgbaird was that i should be taking the max of the acq values list. But i was under the impression that we were looking for points with the highest posterior uncertainty and since its negative we would be looking for the min..?
qNegIntegratedPosteriorVariance
measures the (integrated) posterior variance after adding the candidate point to the model. The smaller the posterior variance (equivalently, larger for qNIPV since it's negated) the larger is the reduction in the uncertainty from adding the candidate to the model. So, @sgbaird is right that you should be selecting the candidate with maximal acquisition value.
Here is a reproducer that i set up in google collab for convenience:
Looks like I don't have access to the colab notebook. Would it be possible to make it accessible more widely?
@saitcakmak,
ooh that makes sense! Thank you for clearing that up.
I went ahead and made the notebook more accessible. You should have access to it now.
I spent some time looking through the notebook. Here are some observations:
sum_acqf_values
is the reason min
works better than max
. x_candidates
do not all have the same number of points in them. Some have 1, some have 2, 3 or 4. In sum_acqf_values
you're evaluating the qNIPV independently for each point in the candidate set and summing their values. Since any single qNIPV value will be negative, you end up with smaller values as the size of the candidate set grows. Using min
picks the larger candidate sets and max
picks the smaller sets. As a result, min
ends up adding more points to the training data faster, leading to seemingly better results early on (though this is very much apples vs oranges, IMO).
Your inputs are in [0, 1] but you're defining Ax parameters in [-15, 15]. In my limited testing, this didn't affect things significantly (it is irrelevant due to below issues), though I can imagine it having adverse effects in a proper application.
Good catch. I copied my original code with the original parameter bounds and forgot to change the bounds on the parameters for the toy dataset that i created.
The optimization loop runs over a fixed set of candidates and ends up adding all of them to training data eventually, so the final MAE will be identical regardless of the method choice.
Right, so just for reference, the original dataset contains about 4,000 data points, which are bunched up into 520 "configurations" (some configurations contain 2 data points, some contain 10 data points, up to 16 data points in a single configuration). The underlying idea here is to use the fewest amount of configurations possible from the 520, to train a model to be able to predict on a test set (obtain the lowest MAE score on a test set is the objective here).
You should re-fit the model at the end of the iteration before using it to generate the model predictions. Otherwise, you're generating the predictions with the model that was before the candidate and not with the model that includes the candidate that was generated during the iteration.
Ohh correct, thanks for catching that.
sum_acqf_values
is the reasonmin
works better thanmax
.x_candidates
do not all have the same number of points in them. Some have 1, some have 2, 3 or 4. Insum_acqf_values
you're evaluating the qNIPV independently for each point in the candidate set and summing their values. Since any single qNIPV value will be negative, you end up with smaller values as the size of the candidate set grows. Usingmin
picks the larger candidate sets andmax
picks the smaller sets. As a result,min
ends up adding more points to the training data faster, leading to seemingly better results early on (though this is very much apples vs oranges, IMO).Gotcha, this makes much more sense now why min is working better than max (thanks for the clarification).
If you are planning on adding multiple points in an iteration, you should jointly evaluate the acquisition value of these candidates rather than evaluating them independently and summing the values. This will give you more accurate representation of the improvement offered by adding multiple points as a batch. This docstring, which I believe @sgbaird wrote a while ago, explains how to do this (you'll want to use list of lists): https://github.com/facebook/Ax/blob/main/ax/modelbridge/torch.py#L403-L409
Ah right, since I plan on adding them as a batch, I should be evaluating the batch of points not individual points.
OK, i went ahead and updated Ax to the latest release (v0.3.4) and unfortunately this caused some issues when trying to tackle this problem. I noticed that i was getting an error related to the acqf_input_constructor
, specifically related to AcquisitionObjective
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
model: Model,
mc_points: Tensor,
training_data: MaybeDict[SupervisedDataset],
objective: Optional[AcquisitionObjective] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if model.num_outputs == 1:
objective = None
base_inputs = _construct_inputs_mc_base(
model=model,
training_data=training_data,
sampler=sampler,
X_pending=X_pending,
objective=objective,
)
return {**base_inputs, "mc_points": mc_points}
I noticed in latest version AcquisitionObjective has been removed. The below code is from the previous version before this latest release (stating that AcquisitionObjective
will be DEPRECATED
)
class AcquisitionObjective(Module, ABC):
r"""Abstract base class for objectives.
DEPRECATED - This will be removed in the next version.
:meta private:
"""
...
I noticed that a Feature Request was made for the inclusion of a qNIPV input_constructor, but didn't find it in input_constructors.py.
Any suggestions for a work around, or maybe an entirely new way to construct the input_constructor
so i can give these suggestions a try.
Also, just wanted to say thank you for the very detailed and extremely helpful response!
OK, i went ahead and updated Ax to the latest release (v0.3.4) and unfortunately this caused some issues when trying to tackle this problem. I noticed that i was getting an error related to the acqf_input_constructor, specifically related to AcquisitionObjective
It's always good to use the latest versions, except that it sometimes causes backwards compatibility issues :). In this case, this part of the code base got cleaned up a bit, both to remove the deprecated classes and to clean up kwarg handling that caused a couple silent bugs along the way. You can define a new input constructor as follows:
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
model: Model,
mc_points: Tensor,
posterior_transform: Optional[PosteriorTransform] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
) -> Dict[str, Any]:
return {
"model": model,
"mc_points": mc_points,
"posterior_transform": posterior_transform,
"X_pending": X_pending,
"sampler": sampler,
}
@saitcakmak,
I attempted this again, and unfortunately ran into another error (although I do feel like we are getting close here :slightly_smiling_face:). With the new changes, I encountered the following error shown below:
Exception has occurred: TypeError
qNegIntegratedPosteriorVariance.__init__() got an unexpected keyword argument 'surrogates'
This error stems from this line here:
acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
I did some debugging to find the source of this error (hopefully to lend some help in identifying the source) and it appears that the error is coming from here:
_instantiate_acquisition
, where surrogates=self.surrogates
is being returned.
_instantiate_acquisition
is then being used in evaluate_acquisition_function
.
The input constructor associated with the acqf_cls
is then being retrieved from the registry here.
Not exactly sure how to go about handling this (although I did try a few basic attempts with no luck).
Can you share the full stack trace of the error you're getting? If it happens at https://github.com/facebook/Ax/blob/main/ax/models/torch/botorch_modular/model.py#L664, it could be that botorch_acqf_class
is being passed as acquisition_class
, which are completely different objects.
Also, a follow up to my above comment on using Normalize
rather than UnitX
: This'll soon become much much easier thanks to a refactor @jelena-markovic is wrapping up (not exported to GitHub yet). The changes make it so that the inputs for the input transforms can be auto constructed using information from search space (and other sources), which eliminates a lot of the boilerplate code I had included above. The same can be achieved using the following after the PR lands:
Mixed_X_trans: List[Type[Transform]] = [
RemoveFixed,
OrderedChoiceEncode,
OneHot,
IntToFloat,
Log,
Logit,
]
model_kwargs: Dict[str, Any] = {
"torch_device": torch_device,
"transforms": Mixed_X_trans + Y_trans,
"transform_configs": {},
"surrogate_specs": {
"mixed_surrogate": SurrogateSpec(
botorch_model_class=botorch_model_class,
input_transform_class=[Normalize], # We just need to pass in the class, options will be auto generated.
)
}
}
GenerationStep(
model=Models.BOTORCH_MODULAR,
num_trials=-1,
model_kwargs=model_kwargs,
should_deduplicate=should_deduplicate,
)
@saitcakmak,
Can you share the full stack trace of the error you're getting?
Here is the repro, which will produce the same full stack trace of the error.
Can be seen here as well:
[INFO 08-28 12:13:46] ax.core.experiment: Attached custom parameterizations [{'g1': 4.0, 'g2': 2.0, 'g3': 0.468565, 'g4': 0.532708, 'g5': 0.565189, 'g6': 0.55303, 'l1': 0.595609, 'l2': 0.554738, 'l3': 0.468565, 'l4': 0.574852}] as trial 35.
[INFO 08-28 12:13:46] ax.service.ax_client: Completed trial 35 with data: {'ppm': (-655.197368, None)}.
[INFO 08-28 12:13:46] ax.core.experiment: Attached custom parameterizations [{'g1': 4.0, 'g2': 2.0, 'g3': 0.468565, 'g4': 0.532708, 'g5': 0.565189, 'g6': 0.55303, 'l1': 0.487083, 'l2': 0.653798, 'l3': 0.468565, 'l4': 0.574852}] as trial 36.
[INFO 08-28 12:13:46] ax.service.ax_client: Completed trial 36 with data: {'ppm': (-658.894139, None)}.
[INFO 08-28 12:13:46] ax.core.experiment: Attached custom parameterizations [{'g1': 3.818182, 'g2': 2.181818, 'g3': 0.468565, 'g4': 0.522285, 'g5': 0.565189, 'g6': 0.521493, 'l1': 0.51061, 'l2': 0.518498, 'l3': 0.468565, 'l4': 0.574852}] as trial 37.
[INFO 08-28 12:13:46] ax.service.ax_client: Completed trial 37 with data: {'ppm': (-659.151184, None)}.
fitted Training data
size of X_candidates 420
size of X_train 5
[INFO 08-28 12:13:46] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
Traceback (most recent call last):
File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 343, in <module>
acqf_values = sum_acqf_values(x_candidates_copy, model_bridge)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 196, in sum_acqf_values
acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 446, in evaluate_acquisition_function
return self._evaluate_acquisition_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 480, in _evaluate_acquisition_function
evals = not_none(self.model).evaluate_acquisition_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 523, in evaluate_acquisition_function
acqf = self._instantiate_acquisition(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 664, in _instantiate_acquisition
return self.acquisition_class(
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: qNegIntegratedPosteriorVariance.__init__() got an unexpected keyword argument 'surrogates'
If it happens at https://github.com/facebook/Ax/blob/main/ax/models/torch/botorch_modular/model.py#L664, it could be that
botorch_acqf_class
is being passed asacquisition_class
, which are completely different objects.
Right, I was looking into this a bit and found some very helpful info in this notebook here as well as here. Can you elaborate a little more on it could be that botorch_acqf_class is being passed as acquisition_class
(it would give me a better understanding of the error that is occurring).
Also just to be clear, in model_kwargs
, this is what is being passed:
model_kwargs_val = {
"surrogate": Surrogate(SingleTaskGP),
"acquisition_class": qNegIntegratedPosteriorVariance,
#"botorch_acqf_class": qNegIntegratedPosteriorVariance,
"acquisition_options": {"mc_points": mcp},
"fit_out_of_design": True,
The acquisition_class
argument refers to this Ax class, in case it is getting replaced by a subclass of it: https://github.com/facebook/Ax/blob/main/ax/models/torch/botorch_modular/acquisition.py#L61
For BoTorch acquisition classes, you should use botorch_acqf_class
argument. Your code would look like this (just using the commented out one instead):
model_kwargs_val = {
"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qNegIntegratedPosteriorVariance,
"acquisition_options": {"mc_points": mcp},
"fit_out_of_design": True,
@saitcakmak
For BoTorch acquisition classes, you should use
botorch_acqf_class
argument. Your code would look like this (just using the commented out one instead):
I previously tried "botorch_acqf_class": qNegIntegratedPosteriorVariance
which also raised an error. I reproduced the error which can be seen below:
[INFO 08-28 23:23:10] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
Traceback (most recent call last):
File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 348, in <module>
acqf_values = sum_acqf_values(x_candidates_copy, model_bridge)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 201, in sum_acqf_values
acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 446, in evaluate_acquisition_function
return self._evaluate_acquisition_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 480, in _evaluate_acquisition_function
evals = not_none(self.model).evaluate_acquisition_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 523, in evaluate_acquisition_function
acqf = self._instantiate_acquisition(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 664, in _instantiate_acquisition
return self.acquisition_class(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/acquisition.py", line 306, in __init__
acqf_inputs = input_constructor(
^^^^^^^^^^^^^^^^^^
TypeError: construct_inputs_qNIPV() got an unexpected keyword argument 'training_data'
The issue here stems from training_data=training_data. One thing I tried after this error occurred was changing the args in construct_inputs_qNIPV
to accept training_data
(can be seen in the below):
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
training_data: MaybeDict[SupervisedDataset],
model: Model,
mc_points: Tensor,
posterior_transform: Optional[PosteriorTransform] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
) -> Dict[str, Any]:
return {
"model": model,
"mc_points": mc_points,
"posterior_transform": posterior_transform,
"X_pending": X_pending,
"sampler": sampler,
"training_data": training_data
}
This then raised an objective
error that stems from objective=objective. Wasn't really sure if I was going down the right path here by doing that.
Since both training_data
and objective
are being passed to construct_inputs_qNIPV
, we need to accept these. qNIPV
does not need the training_data
in its constructor (this is typically used to extract baseline_X
for qNEI
type acqfs), so it should not be included in the returned dict. Similarly, qNIPV
does not support objective
, so it should probably error out if objective
is not None
.
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
training_data: MaybeDict[SupervisedDataset],
model: Model,
mc_points: Tensor,
objective: Optional[MCAcquisitionObjective] = None,
posterior_transform: Optional[PosteriorTransform] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
) -> Dict[str, Any]:
if objective is not None:
raise RuntimeError("objective is not supported in qNIPV")
return { # This should only return inputs that are accepted by qNIPV.__init__
"model": model,
"mc_points": mc_points,
"posterior_transform": posterior_transform,
"X_pending": X_pending,
"sampler": sampler,
}
@ramseyissa, does this resolve the issue for you? Please let us know!
Hello @lena-kashtelyan and @saitcakmak,
Thank you for reaching out. I tried this initially when it was posted and unfortunately it led to an error. I just wanted to make sure I posted a repro for you to also see before replying to this.
One thing to note I was getting a few of these errors below:
TypeError: construct_inputs_qNIPV() got an unexpected keyword argument
Before just finally passing the **kwargs: Any,
argument in the input constructor
which resolved any issues of unexpected key word arguments being passed.
The error that you will see in the repro is:
Exception has occurred: TypeError construct_inputs_qNIPV() missing 1 required positional argument: 'mc_points'
which is odd because this argument is being passed.
Ugh, looks like mc_points
gets silently filtered out since it is not explicitly allowed in https://github.com/pytorch/botorch/blob/main/botorch/acquisition/input_constructors.py#L200.
@esantorella thoughts on how to best handle this? I believe allow_only_specific_variable_kwargs
was implemented so that kwargs would not be silently ignored.
Hello @saitcakmak @esantorella @lena-kashtelyan,
Just wanted to follow up on this issue, and to see if there are any suggestions on how to handle this.
Hi @ramseyissa. Sorry about not following up on this earlier. As a workaround for now, can you create a subclass of qNegIntegratedPosteriorVariance
that uses hard-coded mc_points
and does not require this as an input? If you use this with an input constructor like the one from above comment (with mc_points
removed), you shouldn't run into the error due to the kwarg getting filtered out.
Hello, I am currently trying to compare two methods. I have about 4000 data points (10 features and 1 target value).
Method 1: I randomly select candidates 10%, 20%, 30%, etc., and trained a model (sklearn regression model) to try to see at what percent is the models performance satisfactory (an R2 value of 90 or above) based on the amount of randomly selected data i give the model.
Method 2: I want to initialize a GP model using Ax, beginning with 10% percent of my data (using this as a sort of SOBOL points to give the model an idea of the design space), and then have it select from the remaining list of candidates what is the next best point to choose based on the Acquisition function (that would give the model the most information about the design space). I then want to attached this candidate and continue through the remaining list of candidates.
Method 2 is where i am having trouble. Not exactly sure how to go about this.
My current Ax experience: I have used Ax previously, to attach my experimental data (
ax_client.attach_trial
), and then have it complete the trial usingax_client.complete_trial
. After that I had it generate what the next best experiment to do usingax_client.get_next_trials
.I have also read through #771 but am having trouble in implementing this.
Thank you in advance!