facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.36k stars 307 forks source link

Setting up robust optimization experiment with MVaR gives error #2077

Closed apaleyes closed 8 months ago

apaleyes commented 10 months ago

Hi! I am trying to setup a Robust optimisation experiment with Ax. There is no tutorial on how to do it, so i pieced something together from unit tests. However, if I am using MVaR as a risk measure, it all errors out.

The complete code is below (without imports), it just follows these two files: https://github.com/facebook/Ax/blob/main/ax/modelbridge/tests/test_robust_modelbridge.py https://github.com/facebook/Ax/blob/main/ax/utils/testing/core_stubs.py#L263

The key point in the code is the risk measure definition:

risk_measure = RiskMeasure(
    risk_measure="MultiOutputExpectation",
    options={"n_w": 16},
)

This works. But if we replace it with MVaR:

risk_measure = RiskMeasure(
    risk_measure="MVaR",
    options={"n_w": 16, "alpha": 0.8},
)

We get the following error after approx. 14 sec wait:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Versions, if necessary: botorch 0.9.4 gpytorch 1.11 ax-platform 0.3.5

Any idea why this might be happening? Thanks!


Complete code:

x1_dist = ParameterDistribution(
        parameters=["x1"], distribution_class="norm", distribution_parameters={}
    )

search_space = RobustSearchSpace(
    parameters=[
        RangeParameter(
            name="x1", parameter_type=ParameterType.FLOAT, lower=-5, upper=10
        ),
        RangeParameter(
            name="x2", parameter_type=ParameterType.FLOAT, lower=0, upper=15
        ),
    ],
    parameter_distributions=[x1_dist],
    num_samples=16,
)

risk_measure = RiskMeasure(
    risk_measure="MultiOutputExpectation",
    options={"n_w": 16},
)
metrics = [
    BraninMetric(
        name=f"branin_{i}", param_names=["x1", "x2"], lower_is_better=True
    )
    for i in range(2)
]
optimization_config = MultiObjectiveOptimizationConfig(
    objective=MultiObjective(
        [
            Objective(
                metric=m,
                minimize=True,
            )
            for m in metrics
        ]
    ),
    objective_thresholds=[
        ObjectiveThreshold(metric=m, bound=10.0, relative=False)
        for m in metrics
    ],
    risk_measure=risk_measure,
)

exp = Experiment(
    name="branin_experiment",
    search_space=search_space,
    optimization_config=optimization_config,
    runner=SyntheticRunner(),
)

sobol = get_sobol(search_space=exp.search_space)
for _ in range(2):
    exp.new_trial(generator_run=sobol.gen(1)).run().mark_completed()

for _ in range(5):
    modelbridge = Models.BOTORCH_MODULAR(
        experiment=exp,
        data=exp.fetch_data(),
        surrogate=Surrogate(botorch_model_class=SingleTaskGP),
        botorch_acqf_class=qNoisyExpectedHypervolumeImprovement,
    )
    trial = (
        exp.new_trial(generator_run=modelbridge.gen(1)).run().mark_completed()
    )
sdaulton commented 10 months ago

MVaR is not differentiable, so gradient issues not terribly surprising.

To get unblocked on this, a recommended alternative is to use MARS (https://proceedings.mlr.press/v162/daulton22a.html) which is way faster and differentiable than directly optimizing MVaR with qNEHVI. You can use MARS by instead setting

risk_measure = RiskMeasure( risk_measure="MARS", options={"n_w": 16, "alpha": 0.8}, ) and


modelbridge = Models.BOTORCH_MODULAR(
        experiment=exp,
        data=exp.fetch_data(),
        surrogate=Surrogate(botorch_model_class=SingleTaskGP),
        botorch_acqf_class=qLogNoisyExpectedImprovement,
)
```.
saitcakmak commented 10 months ago

Hi @apaleyes. The code you shared runs fine for me, on Ax 0.3.6. I don't think there were any changes to this part of the code recently, so I don't know why you'd be getting an error due to gradients.

Can you try again with the latest versions of Ax & BoTorch? If you get the error again, sharing the full stack trace could be helpful to identify where the error is coming from.

saitcakmak commented 10 months ago

Oh, I copy pasted the code and didn't realize that it was using expectation rather than MVaR. I can reproduce the issue after updating that

saitcakmak commented 10 months ago

Ok, the issue is that the MVaR implementation in BoTorch is not differentiable. The code has a warning on this but it is easy to miss when you get an error: https://github.com/pytorch/botorch/blob/main/botorch/acquisition/multi_objective/multi_output_risk_measures.py#L498-L505

We do have a version of it with approximate gradients but looks like that change was never upstreamed to BoTorch.

apaleyes commented 10 months ago

Thanks, @sdaulton , that unblocked me indeed! Can I ask why your code uses qLogNoisyExpectedImprovement and not its hypervolume counterpart?

@saitcakmak glad it reproduced, thanks for responding with the fix so quickly

sdaulton commented 10 months ago

Glad that unblocked you! MARS optimizes MVaR by optimizing the VaR of random Chebyshev scalarizations. Since it scalarizes the problem, it uses a single-objective acquisition function.

sdaulton commented 8 months ago

@saitcakmak, did the differentiable MVaR version resolve the NaN issue?

saitcakmak commented 8 months ago

Yep, the error is resolved with the differentiability support.