Random vs Acquisition function selected Candidates

facebook / Ax

Adaptive Experimentation Platform

https://ax.dev

MIT License

2.38k stars 312 forks source link

Random vs Acquisition function selected Candidates #1557

Closed ramseyissa closed 8 months ago

ramseyissa commented 1 year ago

Hello, I am currently trying to compare two methods. I have about 4000 data points (10 features and 1 target value).

Method 1: I randomly select candidates 10%, 20%, 30%, etc., and trained a model (sklearn regression model) to try to see at what percent is the models performance satisfactory (an R2 value of 90 or above) based on the amount of randomly selected data i give the model.

Method 2: I want to initialize a GP model using Ax, beginning with 10% percent of my data (using this as a sort of SOBOL points to give the model an idea of the design space), and then have it select from the remaining list of candidates what is the next best point to choose based on the Acquisition function (that would give the model the most information about the design space). I then want to attached this candidate and continue through the remaining list of candidates.

Method 2 is where i am having trouble. Not exactly sure how to go about this.

My current Ax experience: I have used Ax previously, to attach my experimental data (ax_client.attach_trial), and then have it complete the trial using ax_client.complete_trial. After that I had it generate what the next best experiment to do using ax_client.get_next_trials.

I have also read through #771 but am having trouble in implementing this.

Thank you in advance!

Balandat commented 1 year ago

So this sounds like an active learning type problem where your goal is to find the best model across the whole design space ("best" in some yet-to-be specified sense)? This would mean that you'd need to use an acquisition function that specifically attempts to do this - standard acquisition functions for optimization such as Expected Improvement won't do the job here. There is e.g. [qNegIntegratedPosteriorVariance] (https://github.com/pytorch/botorch/blob/main/botorch/acquisition/active_learning.py#L40) that could be used to that end. I am not sure if we have any examples for how to use it in ax, but it should be possible to do this via the modular BoTorch model setup: https://github.com/facebook/Ax/blob/main/tutorials/modular_botax.ipynb

Once that's hooked up, then you should be able to use evaluate_acquisition_function to evaluate the acquisition function on your held-out points to pick the one with the highest value. See also the modular BoTorch model tutorial for how to do this (it's not actually run in code but discussed in the comments there).

ramseyissa commented 1 year ago

@Balandat Thank you very much for the quick reply!

Yes exactly, this is an active learning problem. My main goal is to try to minimize the amount of data (of the 4000 samples) used to train a model, that would then be able to give me an accurate (say 80-90%) prediction on a validation set and eventually on a held-out test set.

Ok great I will take a look at the references.

So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.

Balandat commented 1 year ago

So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.

So in general you'll need some kind of model with uncertainty quantification if you want to use qNegIntegratedPosteriorVariance or similar acquisition functions that are based on minimizing global model uncertainty. A GP would be the obvious choice here (and should work out of the box, unlike some other models which you'd have to hook up).

ramseyissa commented 1 year ago

Ok that makes sense. So to validate this i would just fit and predict using the GP on the validation set iteratively after selecting the next best point (as in the sense that it would be my evaluation function)?

Balandat commented 1 year ago

yup, that makes sense to me!

ramseyissa commented 1 year ago

@Balandat Hello again, Thank you very much for the initial advice. I have a semi-working implementation (i say semi for the following reason). I notice when i evaluate the acquisition function qNegIntegratedPosteriorVariance, and print print(best_candidate_idx) i always get an output of tensor(0) for the index. But when i do this with UpperConfidenceBound the indices printed are random (which makes a bit more sense to me). Can you shed some light on this?

for i in range(len(X_candidates)):
    # Fit the GP
    gp = SingleTaskGP(X_train, Y_train)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)

    # acquisition function
    acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates)
    # acq_func = UpperConfidenceBound(gp, beta=2.0)
    # acq_values = acq_func(X_candidates.unsqueeze(-2))

    # Evaluate the acquisition function on the candidate set
    acq_values = acq_func(X_candidates)

    # Find the index of the best candidate in the candidate set
    best_candidate_idx = torch.argmax(acq_values)
    print(best_candidate_idx)

    # add the best candidate to the training data
    X_train = torch.cat([X_train, X_candidates[best_candidate_idx].unsqueeze(0)])
    Y_train = torch.cat([Y_train, Y_candidates[best_candidate_idx].unsqueeze(0)])

    # remove the best candidate from candidates
    X_candidates = torch.cat([X_candidates[:best_candidate_idx], X_candidates[best_candidate_idx+1:]])
    Y_candidates = torch.cat([Y_candidates[:best_candidate_idx], Y_candidates[best_candidate_idx+1:]])

    # Predict MAE
    test_posterior = gp.posterior(X_test)
    test_predictions = test_posterior.mean.detach().numpy()
    mae = mean_absolute_error(Y_test, test_predictions)

Thank you in advance!

Balandat commented 1 year ago

Hmm I think you may be using qNegIntegratedPosteriorVariance incorrectly here. Note that the mc_points arguments takes a (typically relatively dense) set of points across the domain that are used for Monte Carlo integration. But by doing acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates) you're passing the candidate set. This results in the acquisition function approximating the integral of the posterior variance by only summing it up on at the candidate points (i.e. incredibly poorly). You should use a different set of points for MC integration instead.

ramseyissa commented 1 year ago

Ah, that makes sense. I updated my script based on your suggestion. I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points. The code below is inserted into the forloop under fit_gpytorch_model(mll). Unfortunately, the print statement for the best_index is still printing tensor(0). Was this approach along the right path that you mentioned?

mc_points_size = 500
mc_points_indices = np.random.choice(len(X_candidates), min(mc_points_size, len(X_candidates)), replace=False)
mc_points = X_candidates[mc_points_indices]

Thank you again for taking the time to respond to the numerous questions!

Balandat commented 1 year ago

I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points.

Oh so this means that you're ever only going to evaluate the posterior variance on a subset of the data points. The mc_points are meant to be points that have nothing to do with the candidates - they are just a set of points that are used for Monte-Carlo integration. If you take a look at eq. (2) in https://arxiv.org/pdf/1710.03206.pdf, the approach qNegIntegratedPosteriorVariance takes is to simply Monte-Carlo integrate the integrand over the domain. The mc_points are the samples from the domain to do this integration, they should be uniformly sampled from the domain (the more the better for accuracy, ofc subject to compute and memory limits). So if your domain is [0,1]^d, this would be something like torch.rand(num_mc_samples, d).

ramseyissa commented 1 year ago

Hey @Balandat, thanks for clarifying what the mc_points are suppose to be. I sort of went down the Botorch rabbit hole these last few days and I am starting to get a little more familiar with it 😅 . The first comment you mentioned with using the evaluate_acquisition_function makes a bit more sense now (going to tackle that next). I opted to use a different acquisition function with the snippet of code i posted earlier and just wanted to run this by you.

bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)

candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)    

candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set

iteration = []
mae_list = []
for i in range(100):
    # Fit the Gaussian Process model
    gp = SingleTaskGP(X_train, Y_train)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)
    qMES = qMaxValueEntropy(gp,candidate_set)
    acq_values = qMES(X_candidates.unsqueeze(-2))
    best_candidate_idx = torch.argmax(acq_values)
    print(best_candidate_idx)

My concern here was just to make sure this is being used correctly ..i defined the acquisition function and used candidate_set drawn from draw_sobol_samples and then used qMES on the samples i am using for active learning (X_candidates).. acq_values = qMES(X_candidates.unsqueeze(-2))... after that i pull the max best_candidate_idx = torch.argmax(acq_values) and then added this point into the training set. Hope this makes sense!

Balandat commented 1 year ago

So here you're effectively optimizing the MES acquisition function on a finite set of random candidate points. That's not wrong per se, but one of the big benefits of the BoTorch acquisition functions is that they are differentiale and so you can use gradient-based optimizers to typically get much better solutions, esp. jn high dim spaces. Check out optimize_acqf for how to do this.

ramseyissa commented 1 year ago

EDIT: Just realized that my assumption is incorrect. It seems like the best way to go is evaluate_acquisition_function as it would allow for evaluating on my actual candidate list. Here optimize_acqf is not looking strictly which value in X_candidates optimizes the acquisition function.

ooh i see! I did some digging based on your last comment and found this notebook you contributed to max_value_entropy. Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..? From the code below this means the returned candidates and acq_value would be the next best candidate to select and train the gp on. Initially in the code above (in the last comment) i treated the candidate_set as we did the mc_points which maybe was not the right approach.

bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
# candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)    
# # candidate_set = torch.rand(1000, bounds.size(1), device=bounds.device, dtype=bounds.dtype)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
iteration = []
mae_list = []
for i in range(len(X_candidates)):
    # Fit the Gaussian Process model
    gp = SingleTaskGP(X_train, Y_train)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)
    qMES = qMaxValueEntropy(gp,X_candidates)
    candidates, acq_value = optimize_acqf(
    acq_function=qMES,
    bounds=bounds,
    q=1,
    num_restarts=10,
    raw_samples=512,
)

Balandat commented 1 year ago

Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..?

Not quite. The qMaxValueEntropy acquisition function, similar to qNegIntegratedPosteriorVariance also requires a discrete set of points candidate_set in the domain over which to draw samples of the max value (this is just due to how that acquisition function is implemented). Bur this is (maybe confusingly) not the same as X_candidate which would be the set of points from which you consider to pick one to evaluate next. In this case you can. Still optimize the acquisition function (for a given candidate_set numerically without having to pre specify a set of discrete points from which to choose.

i treated the candidate_set as we did the mc_points which maybe was not the right approach.

that is the right approach.

ramseyissa commented 1 year ago

Hey @Balandat, its been a little while, but i reevaluated my approach a bit (hopefully this is closer to what you suggested early on). See sample code please.

I have a few questions here:

In this use case, do i need to optimize the acquisition function as in the previous use case (in the above example that we previously spoken about)? I ask because oddly enough my mean absolute error fluctuated a bit and did not decrease as expected (so i assume i am making a mistake somewhere).
One issue i encountered was when i was trying to swap out qMaxValueEntropy for qNegIntegratedPosteriorVariance. For some reason a direct swap in this code raised an error. Here is the error raised.
```
RuntimeError: Input constructor for acquisition class qNegIntegratedPosteriorVariance not registered. Use the @acqf_input_constructor` decorator to register a new method.
```
Can this acquisition function be added to evaluate_acquisition_function so that it is compatible? (maybe i should be opening up a separate issue for this).

Here is the sample code:

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.BOTORCH_MODULAR, 
            num_trials=-1,  
            max_parallelism=3, 
            model_kwargs={"surrogate": Surrogate(SingleTaskGP),
                          "botorch_acqf_class": qMaxValueEntropy,
                }
        ),
    ]
)

test_set_dict = [ObservationFeatures(x) for x in X_test.to_dict(orient="records")]

ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
    name="Peak_Prediction", 
    parameters=parameters, 
    objective_name="ppm",
    # minimize=False, 
    )

# Attach and complete trials for the initial training set
for j in range(len(X_train)):
    _, trial_index = ax_client.attach_trial(X_train.iloc[j, :].to_dict())
    ax_client.complete_trial(
        trial_index=trial_index, raw_data=Y_train.iloc[j].to_dict()
    )
print("fitted Training data")

iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(X_candidates)
for i in range(len_X_candidates):
    ax_client.fit_model()

    model_bridge = ax_client.generation_strategy.model

    observation_features = [
    ObservationFeatures(x) for x in X_candidates.to_dict(orient="records")
]

    acqf_values = model_bridge.evaluate_acquisition_function(
        observation_features=observation_features)

    max_value = max(acqf_values)
    max_index = acqf_values.index(max_value)
    next_experiment = X_candidates.iloc[max_index]
    Y_candidate_value = Y_candidates.iloc[max_index]

    # Add the features with the max value to X_train and remove from X_candidates
    X_train = pd.concat([X_train, next_experiment.to_frame().T], ignore_index=True)
    X_candidates = X_candidates.drop(X_candidates.index[max_index], axis=0)

    # Attach and complete the next trial
    _, trial_index = ax_client.attach_trial(next_experiment.to_dict())
    ax_client.complete_trial(
        trial_index=trial_index, raw_data=Y_candidate_value.to_dict()
    )
    trials = trials + 1
    model = ax_client.generation_strategy.model
    y_predicted = model.predict(test_set_dict)
    predicted_y_values = y_predicted[0]['ppm']
    mae = mean_absolute_error(Y_test, predicted_y_values)
    i = i + 1
    iteration.append(i)
    mae_list.append(mae)

Thank you for your time!

sgbaird commented 1 year ago

@ramseyissa, maybe the following will be of help?

Note: copied from https://github.com/facebook/Ax/issues/1312

Here is a self-contained example using vanilla single-objective optimization based on https://github.com/facebook/Ax/issues/460#issuecomment-975233349:

from typing import Any, Dict, Optional

import torch

# from ax.core.objective import ScalarizedObjective
from ax.modelbridge import get_sobol
from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from ax.modelbridge.registry import Models
from ax.models.torch.botorch_modular.surrogate import Surrogate
from ax.service.ax_client import AxClient
from botorch.acquisition.active_learning import (
    MCSampler,
    qNegIntegratedPosteriorVariance,
)
from botorch.acquisition.input_constructors import (
    MaybeDict,
    acqf_input_constructor,
    construct_inputs_mc_base,
)
from botorch.acquisition.objective import AcquisitionObjective
from botorch.models.gp_regression import SingleTaskGP
from botorch.models.model import Model
from botorch.utils.datasets import SupervisedDataset
from torch import Tensor

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    model: Model,
    mc_points: Tensor,
    training_data: MaybeDict[SupervisedDataset],
    objective: Optional[AcquisitionObjective] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
    **kwargs: Any,
) -> Dict[str, Any]:

    if model.num_outputs == 1:
        objective = None

    base_inputs = construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        sampler=sampler,
        X_pending=X_pending,
        objective=objective,
    )

    return {**base_inputs, "mc_points": mc_points}

def objective_function(x):
    f = x["x1"] ** 2 + x["x2"] ** 2 + x["x3"] ** 2
    return {"f": (f, None)}

parameters = [
    {"name": "x1", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
    {"name": "x2", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
    {"name": "x3", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
]
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)

model_kwargs_val = {
    "surrogate": Surrogate(SingleTaskGP),
    "botorch_acqf_class": qNegIntegratedPosteriorVariance,
    "acquisition_options": {"mc_points": mcp},
}

gs = GenerationStrategy(
    steps=[
        GenerationStep(model=Models.SOBOL, num_trials=num_sobol),
        GenerationStep(
            model=Models.BOTORCH_MODULAR, num_trials=num_qnipv, model_kwargs=model_kwargs_val
        ),
    ]
)

ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
    name="active_learning_experiment",
    parameters=parameters,
    objective_name="f",
    minimize=True,
)

for _ in range(20):
    trial_params, trial_index = ax_client.get_next_trial()
    data = objective_function(trial_params)
    ax_client.complete_trial(trial_index=trial_index, raw_data=data["f"])

ramseyissa commented 1 year ago

@sgbaird, thank you very much, this was very helpful! I managed to tweak this a bit for my use case, and swapped out the qNIPV acquisition function for qMES using its own input constructor. I have a few question that you might be able to address here and if @Balandat has some input on this that would also be appreciated.

Ok so the question is, when calling evaluate_acquisition_function(observation_features=observation_features), observation features here is a dictionary of my candidate list that i am looking to pick the next sample from, do i need to call optimize_acqf function at some point before using evaluate_acquisition_function or is this automatically handled here?

Thank you both for your time!

This is the input constructor used for qMES for reference:

@acqf_input_constructor(qMaxValueEntropy)
def construct_inputs_qMES(
    model: Model,
    training_data: MaybeDict[SupervisedDataset],
    bounds: List[Tuple[float, float]],
    objective: Optional[MCAcquisitionObjective] = None,
    # posterior_transform: Optional[PosteriorTransform] = None,
    candidate_size: int = 1000,
    **kwargs: Any,
) -> Dict[str, Any]:
    r"""Construct kwargs for `qMaxValueEntropy` constructor."""
    inputs_mc = construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        objective=objective,
        **kwargs,
    )

    X = _get_dataset_field(training_data, "X", first_only=True)
    _kw = {"device": X.device, "dtype": X.dtype}
    _rvs = torch.rand(candidate_size, len(bounds), **_kw)
    _bounds = torch.tensor(bounds, **_kw).transpose(0, 1)
    return {
        **inputs_mc,
        "candidate_set": _bounds[0] + (_bounds[1] - _bounds[0]) * _rvs,
        "maximize": kwargs.get("maximize", True),
    }

sgbaird commented 1 year ago

I can't really comment on the accuracy of the construct_inputs_qMES function (other than, if it doesn't throw errors, that's a good sign 😉) . I don't think you need to deal with optimize_acqf if you're using evaluate_acquisition_function.

Balandat commented 1 year ago

Indeed, you're evaluating the acquisition function on a discrete set here using evaluate_acquisition_function, so you can just pick the best and not use optimize_acqf.

sgbaird commented 1 year ago

Recapping some internal discussion, @ramseyissa is predicting peak positions for an NMR spectra, so a single set of inputs typically has multiple outputs. Depending on the number of new atoms added to each simulation, there could be anywhere from 1 to 16 peaks. I think the possible locations of the peak are on the order of 100's, if not continuous, and 16 outputs might be too much in terms of modeling output behavior.

One option would be to train a VAE to move a spectrum object back and forth from a low-dimensional representation (e.g., ~4 values) and a high-dimensional version (16 values). The ppm values could be sorted to reduce degeneracies or data augmentation could be applied (swapping values).

https://github.com/facebook/Ax/issues/1312#issue-1488742013

Then, I think ScalarizedPosteriorTransform gets used to compute a sort of joint acquisition function. Maybe this can be specified in the part of the generation strategy that gets passed more directly to BoTorch)

EDIT: @ramseyissa, also, I'm not sure if Max-value Entropy Search (MES) is the right model for your application since it still seems to be focused on minimizing or maximizing some output property that you're modeling rather than minimizing the global uncertainty, irrespective of whether the property is low or high. From https://botorch.org/tutorials/max_value_entropy:

Max-value entropy search (MES) acquisition function quantifies the information gain about the maximum of a black-box function

^{emphasis added}

ramseyissa commented 1 year ago

@Balandat @sgbaird :

I have been trying to use qNIPV in the same way i used MES, but i keep getting an error that isn't making much sense to me. In my problem just to remind you, i am not looking to optimize my objective value, i am just looking to minimize my global uncertainty for a predefined list of candidates.

It seems that when using the construct_inputs_qNIPV it is not recognizing mc_points that is leading to the error (the error is seen at the bottom). Any suggestions on how to address this?

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    model: Model,
    mc_points: Tensor,
    training_data: MaybeDict[SupervisedDataset],
    objective: Optional[AcquisitionObjective] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
    **kwargs: Any,
) -> Dict[str, Any]:

    if model.num_outputs == 1:
        objective = None

    base_inputs = construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        sampler=sampler,
        X_pending=X_pending,
        objective=objective,
    )

    return {**base_inputs, "mc_points": mc_points}

# the objective function here is just predicting on a test set and calculating the mae
# with the model that is updated through an active learning loop
def objective_function(model, test_set_dict, test_set_target_vales):
    # get gp 
    model = ax_client.generation_strategy.model

    # predict on test set
    y_predicted = model.predict(test_set_dict)

    # get predicted values
    predicted_y_values = y_predicted[0]['ppm']

    # calculate mae
    mae = mean_absolute_error(test_set_target_vals, predicted_y_values)
    print(f'MAE: {mae}')
    return {"mae": (mae, None)}

init_len_of_x_train = len(train_dfs)

parameters = [
    {"name": vals, "type": "range", "bounds": [-15.0, 15.0], "value_type": "float"}
    for vals in descriptors_df.columns]

# this is used to get the mc_points for the acquisition function
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)

# dict of kwargs for BoTorch modular in generation strategy 
model_kwargs_val = {
    "surrogate": Surrogate(SingleTaskGP),
    "botorch_acqf_class": qNegIntegratedPosteriorVariance,
    "acquisition_options": {"mc_points": mcp},
}

gs = GenerationStrategy(
    steps=[
        # GenerationStep(model=Models.SOBOL, num_trials=5),
        GenerationStep(
            model=Models.BOTORCH_MODULAR, num_trials=-1
            , model_kwargs=model_kwargs_val
        ),
    ]
)

ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
    name="active_learning_experiment",
    parameters=parameters,
    objective_name="ppm",
    # minimize=True,
)

# test set 
test_set_dict = [ObservationFeatures(x) for df in test_dfs for x in df.to_dict(orient="records")]

#get test set target values 
test_set_target_vals = get_target_values(test_dfs, ppm_df, 'ppm')

#get train set target values
train_set_target_vals = get_target_values(train_dfs, ppm_df, 'ppm')

# my training data is a list of dataframes, so I need to attach each dataframe to ax_client
for df in train_dfs:
    for idx in range(len(df)):
        _, trial_index = ax_client.attach_trial(df.iloc[idx, :].to_dict())
        ax_client.complete_trial(
        trial_index=trial_index, raw_data=train_set_target_vals.iloc[idx].to_dict()
    )

index_of_candidate_set = []
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(x_candidates)
for i in range(len_X_candidates):
    print("size of X_candidates",len(x_candidates))
    print("size of X_train",len(train_dfs))

    # fit gp
    ax_client.fit_model()

    # get gp 
    model_bridge = ax_client.generation_strategy.model

    acqf_values = sum_acqf_values(x_candidates, model_bridge)

    #get max of acqf values
    max_value = max(acqf_values)

    #get index of max value
    max_index = acqf_values.index(max_value)

    # locate features in orginal candidate set
    next_experiment = x_candidates[max_index]

    next_exp_target_vals = get_target_values(next_experiment, ppm_df, 'ppm')

    # add the best candidate selected by acquisition fx experiment to training set
    train_dfs.append(next_experiment)

    #drop the best candidate selected by acquisition fx experiment from candidate set
    x_candidates.pop(max_index)

    # attach this new trail to ax_client, and complete it with the target value "ppm"
    for idx in range(len(next_experiment)):
        _, trial_index = ax_client.attach_trial(next_experiment.iloc[idx, :].to_dict())
        ax_client.complete_trial(
        trial_index=trial_index, raw_data=next_exp_target_vals.iloc[idx].to_dict())

    trials = trials + 1

    model = ax_client.generation_strategy.model

    mae_vals = objective_function(model, test_set_dict, test_set_target_vals)

    i = i + 1
    iteration.append(i)
    print(f"Iteration {i+1}: MAE = {mae_vals:.4f}")
    mae_list.append(mae_vals)
    index_of_candidate_set.append(max_index)

exp_df = exp_to_df(ax_client.experiment)

num_of_zeros = len(train_dfs)

exp_df['MAE'] = [0] * init_len_of_x_train + mae_list
# exp_df

The resulting error from the above script is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 193
    190     # get gp 
    191     model_bridge = ax_client.generation_strategy.model
--> 193     acqf_values = sum_acqf_values(x_candidates, model_bridge)
    194     # print("len of acqf_values",len(acqf_values))
    195 
    196     # turn list of candidates into dict of observation features
   (...)
    205     
    206 #     #get max of acqf values
    207     max_value = max(acqf_values)

Cell In[16], line 5, in sum_acqf_values(list_of_dfs, model_bridge)
      3 for df in list_of_dfs:
      4     observation_features = [ObservationFeatures(x) for x in df.to_dict(orient="records")]
----> 5     acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
      6     # print('acqf values', acqf_values)
      7     sum_acqf_values_list.append(sum(acqf_values))

File [/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434), in TorchModelBridge.evaluate_acquisition_function(self, observation_features, search_space, optimization_config, pending_observations, fixed_features, acq_options)
    425         obs_feats[i] = t.transform_observation_features(batch)
    427 base_gen_args = self._get_transformed_gen_args(
    428     search_space=search_space,
...
    305 )
    306 self.acqf = botorch_acqf_class(**acqf_inputs)  # pyre-ignore [45]
    307 self.X_pending: Optional[Tensor] = unique_Xs_pending

TypeError: construct_inputs_qNIPV() missing 1 required positional argument: 'mc_points'

sgbaird commented 1 year ago

@ramseyissa, can you provide a reproducer? i.e., something that can be copy-pasted and run standalone (preferably a link to a Google Colab notebook 😉)

sgbaird commented 1 year ago

I have a MWE example for qNIPV, though without anything special. Just 3 float inputs and 1 float output. If you can use this script as a template and gradually adapt it to your use-case, you should be able to pinpoint the discrepancy (and possibly resolve the error you were running into).

EDIT: this needs to be updated to account for transformations per Max's comment https://github.com/facebook/Ax/issues/1557#issuecomment-1624582689

ramseyissa commented 1 year ago

@sgbaird,

I went through the MWE example for qNIPV that was referenced above and it was very helpful. I believe it is the same working example you referenced in https://github.com/facebook/Ax/issues/1557#issuecomment-1528840459 earlier on.

Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).

So I think all finally looks good, but maybe a final check from @Balandat would seal the deal (I can provide the reproducer @sgbaird was after :wink:)

Balandat commented 1 year ago

Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).

Hmm this is interesting. I didn't go into the weeds on this but I believe this might be due to the fact that qMaxValueEntropy has an input constructor that generating a set of candidate_size candidate_points. The difference to the acqf constructor for qNIPV used here is that candidate_size has a default value, so that gets used if no candidate_size is given in the acquisition_options: https://github.com/pytorch/botorch/blob/6f9818756b6a3f83ad52aa3a6cd7b2591bd3ce0e/botorch/acquisition/input_constructors.py#L865

Gotcha alert: One thing to note here is that the BoTorch model when wrapped in Ax operates in the transformed space (typically the unit cube if all parameters are continuous parameters). So mc_points here will need to be passed in that transformed space rather than the original space over which the parameters are defined. Otherwise you'll get some very funky results (e.g. say your parameters are specified on [1000, 2000]^d, and you choose your mc_points uniformly over the same set, then in the transformed space the model will operate on [0, 1]^d and so most if not all of the mc_points will not actually lie in what corresponds to the search space - in this case "very funky" would mean total garbage). Unfortunately it's at least at this point not straightforward to apply the Ax transforms automatically to the mc_points. Hopefully we can come up with a better solution for this in the future. cc @saitcakmak, @dme65

sgbaird commented 1 year ago

@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?

Balandat commented 1 year ago

@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?

Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).

sgbaird commented 1 year ago

@Balandat does this look OK?

# MC Points
# WARNING: assumes only UnitX transform, https://ax.dev/docs/models.html#transforms
num_mc_sobol = 2**13
sobol = get_sobol(ax_client_tmp.experiment.search_space)
# mc_points = sobol.gen(1024).param_df.values
obs_features = [
    ObservationFeatures(parameters)
    for parameters in sobol.gen(num_mc_sobol).param_df.to_dict("records")
]
ux = UnitX(ax_client_tmp.experiment.search_space)
obs_features_ux = ux.transform_observation_features(obs_features)
mc_points = [list(obs.parameters.values()) for obs in obs_features_ux]
mcp = torch.tensor(mc_points)
mcp

Balandat commented 1 year ago

Hmm yeah that seems like it should work! I guess since UnitX transforms to the unit cube it should be enough to just generate the points directly without Ax involvement:

mcp = torch.quasirandom.SobolEngine(num_params, scramble=True).draw_base2(13)

Again this assumes that there aren't any more complex things going on such as choice parameters or the like..

ramseyissa commented 1 year ago

@Balandat, thanks for the insight.

Me and @sgbaird have had some internal discussion regarding some of the items discussed above. The result of this are some questions i need clarification on.

In the reply below, when you mentioned "scaling the data before passing it to Ax is the right thing to do here.." did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set? I mean this as in a preprocessing step before beginning an Ax optimization strategy. Also, just a side note here, from what i understand Ax typically scales the data that is passed to it internally?

@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?

Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).

Another clarification, when looking at a list of points that have been evaluated using evaluate_acquisition_function (when using the qNIPV acquisition function), should i be selecting the most negative point in the list, so the min value since it would be the most negative?

sgbaird commented 1 year ago

did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set?

Pretty sure the search space (i.e., parameters and related kwargs) can be left untouched as long as we scale mcp appropriately (i.e., the only change that needs to occur is in transforming mcp).

2. should i be selecting the most negative point in the list

My Intuition is to always take the maximum of the acquisition function, since acq fns are used to "... evaluate the usefulness of one of more design points ...". I'll defer to @Balandat though to clarify

saitcakmak commented 1 year ago

Gotcha alert: One thing to note here is that the BoTorch model when wrapped in Ax operates in the transformed space (typically the unit cube if all parameters are continuous parameters). So mc_points here will need to be passed in that transformed space rather than the original space over which the parameters are defined. Otherwise you'll get some very funky results (e.g. say your parameters are specified on [1000, 2000]^d, and you choose your mc_points uniformly over the same set, then in the transformed space the model will operate on [0, 1]^d and so most if not all of the mc_points will not actually lie in what corresponds to the search space - in this case "very funky" would mean total garbage). Unfortunately it's at least at this point not straightforward to apply the Ax transforms automatically to the mc_points. Hopefully we can come up with a better solution for this in the future.

One way to get around this would be to remove UnitX and use Normalize instead. Unfortunately, this requires more than a few lines of code to setup the generation strategy. Here's a piece of code that sets up an MBM GenerationStep that uses Normalize rather than UnitX.

Mixed_X_trans: List[Type[Transform]] = [
    RemoveFixed,
    OrderedChoiceEncode,
    OneHot,
    IntToFloat,
    Log,
    Logit,
]
model_kwargs: Dict[str, Any] = {
    "torch_device": torch_device,
    "transforms": Mixed_X_trans + Y_trans,
    "transform_configs": {},
}
# Extract the search space digest to get the bounds for transforms.
tf_search_space = transform_search_space(
    search_space=experiment.search_space,
    transforms=[Cast] + Mixed_X_trans,
    transform_configs=model_kwargs["transform_configs"],
)
search_space_digest = extract_search_space_digest(
    search_space=tf_search_space,
    param_names=list(tf_search_space.parameters.keys()),
)
bounds = torch.as_tensor(
    search_space_digest.bounds,
    dtype=torch.double,
    device=torch_device,
).T
botorch_tf = Normalize(d=bounds.shape[-1], bounds=bounds)
model_kwargs["surrogate_specs"] = {
    "mixed_surrogate": SurrogateSpec(
        botorch_model_class=botorch_model_class,
        input_transform=botorch_tf,
    )
}
GenerationStep(
    model=Models.BOTORCH_MODULAR,
    num_trials=-1,
    model_kwargs=model_kwargs,
    should_deduplicate=should_deduplicate,
)

lena-kashtelyan commented 1 year ago

@ramseyissa would you say this is resolved?

ramseyissa commented 1 year ago

@lena-kashtelyan,

Thank you for checking on this. I have a few questions as unfortunately my implementation is not producing the anticipated results. Here is a plot of the results. I am not quite sure why random selection is beating out my acquisition function selected data points in terms of reducing the overall MAE of a test set.

Here is a reproducer that i set up in google collab for convenience:

So just to clarify what is happening here, I have a list of dataframes, and in each of these dataframes there can be 1 to 16 points in them (1 to 16 samples.. each dataframe is called a configuration). There are about 4200 hundred total (individual) points, but 520 configurations. I train my gp on about 5 or so configurations, and then begin to have the acq f(x) select what the next best configuration to choose. This is done by evaluate_acquisition_function for each point in the configuration and taking the sum of the acq values for each configuration, as there can be 1 to 16 points within each dataframe. I then take the min value of the list of acq values, and add that configuration to my gp.

One thing that was mentioned by my lab mate @sgbaird was that i should be taking the max of the acq values list. But i was under the impression that we were looking for points with the highest posterior uncertainty and since its negative we would be looking for the min..? (i can be completely wrong here, but i think you can clear this up). A side note on this, i took the max and it performed worse, so it could be my implementation.

@saitcakmak

saitcakmak commented 1 year ago

One thing that was mentioned by my lab mate @sgbaird was that i should be taking the max of the acq values list. But i was under the impression that we were looking for points with the highest posterior uncertainty and since its negative we would be looking for the min..?

qNegIntegratedPosteriorVariance measures the (integrated) posterior variance after adding the candidate point to the model. The smaller the posterior variance (equivalently, larger for qNIPV since it's negated) the larger is the reduction in the uncertainty from adding the candidate to the model. So, @sgbaird is right that you should be selecting the candidate with maximal acquisition value.

Here is a reproducer that i set up in google collab for convenience:

Looks like I don't have access to the colab notebook. Would it be possible to make it accessible more widely?

ramseyissa commented 1 year ago

@saitcakmak,

ooh that makes sense! Thank you for clearing that up.

I went ahead and made the notebook more accessible. You should have access to it now.

saitcakmak commented 1 year ago

I spent some time looking through the notebook. Here are some observations:

Your inputs are in [0, 1] but you're defining Ax parameters in [-15, 15]. In my limited testing, this didn't affect things significantly (it is irrelevant due to below issues), though I can imagine it having adverse effects in a proper application.
The optimization loop runs over a fixed set of candidates and ends up adding all of them to training data eventually, so the final MAE will be identical regardless of the method choice.
You should re-fit the model at the end of the iteration before using it to generate the model predictions. Otherwise, you're generating the predictions with the model that was before the candidate and not with the model that includes the candidate that was generated during the iteration.
sum_acqf_values is the reason min works better than max. x_candidates do not all have the same number of points in them. Some have 1, some have 2, 3 or 4. In sum_acqf_values you're evaluating the qNIPV independently for each point in the candidate set and summing their values. Since any single qNIPV value will be negative, you end up with smaller values as the size of the candidate set grows. Using min picks the larger candidate sets and max picks the smaller sets. As a result, min ends up adding more points to the training data faster, leading to seemingly better results early on (though this is very much apples vs oranges, IMO).
If you are planning on adding multiple points in an iteration, you should jointly evaluate the acquisition value of these candidates rather than evaluating them independently and summing the values. This will give you more accurate representation of the improvement offered by adding multiple points as a batch. This docstring, which I believe @sgbaird wrote a while ago, explains how to do this (you'll want to use list of lists): https://github.com/facebook/Ax/blob/main/ax/modelbridge/torch.py#L403-L409
In addition to joint evaluation, if you are comparing candidate sets of different sizes, I would generally recommend normalizing the acquisition value by the # of points in the candidate set. Otherwise, you'll most likely end up with better acquisition values for larger candidate sets.

ramseyissa commented 1 year ago

Your inputs are in [0, 1] but you're defining Ax parameters in [-15, 15]. In my limited testing, this didn't affect things significantly (it is irrelevant due to below issues), though I can imagine it having adverse effects in a proper application.

Good catch. I copied my original code with the original parameter bounds and forgot to change the bounds on the parameters for the toy dataset that i created.

The optimization loop runs over a fixed set of candidates and ends up adding all of them to training data eventually, so the final MAE will be identical regardless of the method choice.

Right, so just for reference, the original dataset contains about 4,000 data points, which are bunched up into 520 "configurations" (some configurations contain 2 data points, some contain 10 data points, up to 16 data points in a single configuration). The underlying idea here is to use the fewest amount of configurations possible from the 520, to train a model to be able to predict on a test set (obtain the lowest MAE score on a test set is the objective here).

You should re-fit the model at the end of the iteration before using it to generate the model predictions. Otherwise, you're generating the predictions with the model that was before the candidate and not with the model that includes the candidate that was generated during the iteration.

Ohh correct, thanks for catching that.

sum_acqf_values is the reason min works better than max. x_candidates do not all have the same number of points in them. Some have 1, some have 2, 3 or 4. In sum_acqf_values you're evaluating the qNIPV independently for each point in the candidate set and summing their values. Since any single qNIPV value will be negative, you end up with smaller values as the size of the candidate set grows. Using min picks the larger candidate sets and max picks the smaller sets. As a result, min ends up adding more points to the training data faster, leading to seemingly better results early on (though this is very much apples vs oranges, IMO).

Gotcha, this makes much more sense now why min is working better than max (thanks for the clarification).

If you are planning on adding multiple points in an iteration, you should jointly evaluate the acquisition value of these candidates rather than evaluating them independently and summing the values. This will give you more accurate representation of the improvement offered by adding multiple points as a batch. This docstring, which I believe @sgbaird wrote a while ago, explains how to do this (you'll want to use list of lists): https://github.com/facebook/Ax/blob/main/ax/modelbridge/torch.py#L403-L409

Ah right, since I plan on adding them as a batch, I should be evaluating the batch of points not individual points.

OK, i went ahead and updated Ax to the latest release (v0.3.4) and unfortunately this caused some issues when trying to tackle this problem. I noticed that i was getting an error related to the acqf_input_constructor, specifically related to AcquisitionObjective

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    model: Model,
    mc_points: Tensor,
    training_data: MaybeDict[SupervisedDataset],
    objective: Optional[AcquisitionObjective] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
    **kwargs: Any,
) -> Dict[str, Any]:
    if model.num_outputs == 1:
        objective = None
    base_inputs = _construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        sampler=sampler,
        X_pending=X_pending,
        objective=objective,
    )
    return {**base_inputs, "mc_points": mc_points}

I noticed in latest version AcquisitionObjective has been removed. The below code is from the previous version before this latest release (stating that AcquisitionObjective will be DEPRECATED)

class AcquisitionObjective(Module, ABC):
    r"""Abstract base class for objectives.

    DEPRECATED - This will be removed in the next version.

    :meta private:
    """
    ...

I noticed that a Feature Request was made for the inclusion of a qNIPV input_constructor, but didn't find it in input_constructors.py.

Any suggestions for a work around, or maybe an entirely new way to construct the input_constructor so i can give these suggestions a try.

Also, just wanted to say thank you for the very detailed and extremely helpful response!

saitcakmak commented 1 year ago

OK, i went ahead and updated Ax to the latest release (v0.3.4) and unfortunately this caused some issues when trying to tackle this problem. I noticed that i was getting an error related to the acqf_input_constructor, specifically related to AcquisitionObjective

It's always good to use the latest versions, except that it sometimes causes backwards compatibility issues :). In this case, this part of the code base got cleaned up a bit, both to remove the deprecated classes and to clean up kwarg handling that caused a couple silent bugs along the way. You can define a new input constructor as follows:

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    model: Model,
    mc_points: Tensor,
    posterior_transform: Optional[PosteriorTransform] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
) -> Dict[str, Any]:
    return {
        "model": model,
        "mc_points": mc_points,
        "posterior_transform": posterior_transform,
        "X_pending": X_pending,
        "sampler": sampler,
    }

ramseyissa commented 1 year ago

@saitcakmak,

I attempted this again, and unfortunately ran into another error (although I do feel like we are getting close here :slightly_smiling_face:). With the new changes, I encountered the following error shown below:

Exception has occurred: TypeError
qNegIntegratedPosteriorVariance.__init__() got an unexpected keyword argument 'surrogates'

This error stems from this line here:

acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)

I did some debugging to find the source of this error (hopefully to lend some help in identifying the source) and it appears that the error is coming from here:

_instantiate_acquisition, where surrogates=self.surrogates is being returned.

_instantiate_acquisition is then being used in evaluate_acquisition_function.

The input constructor associated with the acqf_cls is then being retrieved from the registry here.

Not exactly sure how to go about handling this (although I did try a few basic attempts with no luck).

saitcakmak commented 1 year ago

Can you share the full stack trace of the error you're getting? If it happens at https://github.com/facebook/Ax/blob/main/ax/models/torch/botorch_modular/model.py#L664, it could be that botorch_acqf_class is being passed as acquisition_class, which are completely different objects.

saitcakmak commented 1 year ago

Also, a follow up to my above comment on using Normalize rather than UnitX: This'll soon become much much easier thanks to a refactor @jelena-markovic is wrapping up (not exported to GitHub yet). The changes make it so that the inputs for the input transforms can be auto constructed using information from search space (and other sources), which eliminates a lot of the boilerplate code I had included above. The same can be achieved using the following after the PR lands:

Mixed_X_trans: List[Type[Transform]] = [
    RemoveFixed,
    OrderedChoiceEncode,
    OneHot,
    IntToFloat,
    Log,
    Logit,
]
model_kwargs: Dict[str, Any] = {
    "torch_device": torch_device,
    "transforms": Mixed_X_trans + Y_trans,
    "transform_configs": {},
    "surrogate_specs": {
    "mixed_surrogate": SurrogateSpec(
            botorch_model_class=botorch_model_class,
            input_transform_class=[Normalize],  # We just need to pass in the class, options will be auto generated.
        )
    }
}
GenerationStep(
    model=Models.BOTORCH_MODULAR,
    num_trials=-1,
    model_kwargs=model_kwargs,
    should_deduplicate=should_deduplicate,
)

ramseyissa commented 1 year ago

@saitcakmak,

Can you share the full stack trace of the error you're getting?

Here is the repro, which will produce the same full stack trace of the error.

Can be seen here as well:

[INFO 08-28 12:13:46] ax.core.experiment: Attached custom parameterizations [{'g1': 4.0, 'g2': 2.0, 'g3': 0.468565, 'g4': 0.532708, 'g5': 0.565189, 'g6': 0.55303, 'l1': 0.595609, 'l2': 0.554738, 'l3': 0.468565, 'l4': 0.574852}] as trial 35.
[INFO 08-28 12:13:46] ax.service.ax_client: Completed trial 35 with data: {'ppm': (-655.197368, None)}.
[INFO 08-28 12:13:46] ax.core.experiment: Attached custom parameterizations [{'g1': 4.0, 'g2': 2.0, 'g3': 0.468565, 'g4': 0.532708, 'g5': 0.565189, 'g6': 0.55303, 'l1': 0.487083, 'l2': 0.653798, 'l3': 0.468565, 'l4': 0.574852}] as trial 36.
[INFO 08-28 12:13:46] ax.service.ax_client: Completed trial 36 with data: {'ppm': (-658.894139, None)}.
[INFO 08-28 12:13:46] ax.core.experiment: Attached custom parameterizations [{'g1': 3.818182, 'g2': 2.181818, 'g3': 0.468565, 'g4': 0.522285, 'g5': 0.565189, 'g6': 0.521493, 'l1': 0.51061, 'l2': 0.518498, 'l3': 0.468565, 'l4': 0.574852}] as trial 37.
[INFO 08-28 12:13:46] ax.service.ax_client: Completed trial 37 with data: {'ppm': (-659.151184, None)}.
fitted Training data
size of X_candidates 420
size of X_train 5
[INFO 08-28 12:13:46] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
Traceback (most recent call last):
  File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 343, in <module>
    acqf_values = sum_acqf_values(x_candidates_copy, model_bridge)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 196, in sum_acqf_values
    acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 446, in evaluate_acquisition_function
    return self._evaluate_acquisition_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 480, in _evaluate_acquisition_function
    evals = not_none(self.model).evaluate_acquisition_function(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 523, in evaluate_acquisition_function
    acqf = self._instantiate_acquisition(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 664, in _instantiate_acquisition
    return self.acquisition_class(
           ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: qNegIntegratedPosteriorVariance.__init__() got an unexpected keyword argument 'surrogates'

If it happens at https://github.com/facebook/Ax/blob/main/ax/models/torch/botorch_modular/model.py#L664, it could be that botorch_acqf_class is being passed as acquisition_class, which are completely different objects.

Right, I was looking into this a bit and found some very helpful info in this notebook here as well as here. Can you elaborate a little more on it could be that botorch_acqf_class is being passed as acquisition_class (it would give me a better understanding of the error that is occurring).

Also just to be clear, in model_kwargs, this is what is being passed:

model_kwargs_val = {
    "surrogate": Surrogate(SingleTaskGP),
    "acquisition_class": qNegIntegratedPosteriorVariance,
    #"botorch_acqf_class": qNegIntegratedPosteriorVariance,
    "acquisition_options": {"mc_points": mcp},
    "fit_out_of_design": True,

saitcakmak commented 1 year ago

The acquisition_class argument refers to this Ax class, in case it is getting replaced by a subclass of it: https://github.com/facebook/Ax/blob/main/ax/models/torch/botorch_modular/acquisition.py#L61

For BoTorch acquisition classes, you should use botorch_acqf_class argument. Your code would look like this (just using the commented out one instead):

model_kwargs_val = {
    "surrogate": Surrogate(SingleTaskGP),
    "botorch_acqf_class": qNegIntegratedPosteriorVariance,
    "acquisition_options": {"mc_points": mcp},
    "fit_out_of_design": True,

ramseyissa commented 1 year ago

@saitcakmak

For BoTorch acquisition classes, you should use botorch_acqf_class argument. Your code would look like this (just using the commented out one instead):

I previously tried "botorch_acqf_class": qNegIntegratedPosteriorVariance which also raised an error. I reproduced the error which can be seen below:

[INFO 08-28 23:23:10] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
Traceback (most recent call last):
  File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/ramseyissa/.vscode/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 348, in <module>
    acqf_values = sum_acqf_values(x_candidates_copy, model_bridge)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ramseyissa/Documents/GitHub/ML-guided-NMR_peak_position_predictions/qNIPV.py", line 201, in sum_acqf_values
    acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 446, in evaluate_acquisition_function
    return self._evaluate_acquisition_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/modelbridge/torch.py", line 480, in _evaluate_acquisition_function
    evals = not_none(self.model).evaluate_acquisition_function(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 523, in evaluate_acquisition_function
    acqf = self._instantiate_acquisition(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/model.py", line 664, in _instantiate_acquisition
    return self.acquisition_class(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/nmr/lib/python3.11/site-packages/ax/models/torch/botorch_modular/acquisition.py", line 306, in __init__
    acqf_inputs = input_constructor(
                  ^^^^^^^^^^^^^^^^^^
TypeError: construct_inputs_qNIPV() got an unexpected keyword argument 'training_data'

The issue here stems from training_data=training_data. One thing I tried after this error occurred was changing the args in construct_inputs_qNIPV to accept training_data (can be seen in the below):

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    training_data: MaybeDict[SupervisedDataset],
    model: Model,
    mc_points: Tensor,
    posterior_transform: Optional[PosteriorTransform] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
) -> Dict[str, Any]:
    return {
        "model": model,
        "mc_points": mc_points,
        "posterior_transform": posterior_transform,
        "X_pending": X_pending,
        "sampler": sampler,
        "training_data": training_data

    }

This then raised an objective error that stems from objective=objective. Wasn't really sure if I was going down the right path here by doing that.

saitcakmak commented 1 year ago

Since both training_data and objective are being passed to construct_inputs_qNIPV, we need to accept these. qNIPV does not need the training_data in its constructor (this is typically used to extract baseline_X for qNEI type acqfs), so it should not be included in the returned dict. Similarly, qNIPV does not support objective, so it should probably error out if objective is not None.

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    training_data: MaybeDict[SupervisedDataset],
    model: Model,
    mc_points: Tensor,
    objective: Optional[MCAcquisitionObjective] = None,
    posterior_transform: Optional[PosteriorTransform] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
) -> Dict[str, Any]:
    if objective is not None:
        raise RuntimeError("objective is not supported in qNIPV")
    return {  # This should only return inputs that are accepted by qNIPV.__init__ 
        "model": model,
        "mc_points": mc_points,
        "posterior_transform": posterior_transform,
        "X_pending": X_pending,
        "sampler": sampler,
    }

lena-kashtelyan commented 1 year ago

@ramseyissa, does this resolve the issue for you? Please let us know!

ramseyissa commented 1 year ago

Hello @lena-kashtelyan and @saitcakmak,

Thank you for reaching out. I tried this initially when it was posted and unfortunately it led to an error. I just wanted to make sure I posted a repro for you to also see before replying to this.

One thing to note I was getting a few of these errors below: TypeError: construct_inputs_qNIPV() got an unexpected keyword argument Before just finally passing the **kwargs: Any, argument in the input constructor which resolved any issues of unexpected key word arguments being passed.

The error that you will see in the repro is: Exception has occurred: TypeError construct_inputs_qNIPV() missing 1 required positional argument: 'mc_points' which is odd because this argument is being passed.

saitcakmak commented 1 year ago

Ugh, looks like mc_points gets silently filtered out since it is not explicitly allowed in https://github.com/pytorch/botorch/blob/main/botorch/acquisition/input_constructors.py#L200.

@esantorella thoughts on how to best handle this? I believe allow_only_specific_variable_kwargs was implemented so that kwargs would not be silently ignored.

ramseyissa commented 1 year ago

Hello @saitcakmak @esantorella @lena-kashtelyan,

Just wanted to follow up on this issue, and to see if there are any suggestions on how to handle this.

saitcakmak commented 1 year ago

Hi @ramseyissa. Sorry about not following up on this earlier. As a workaround for now, can you create a subclass of qNegIntegratedPosteriorVariance that uses hard-coded mc_points and does not require this as an input? If you use this with an input constructor like the one from above comment (with mc_points removed), you shouldn't run into the error due to the kwarg getting filtered out.