facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.38k stars 311 forks source link

Service API on PyTorch CNN:`ValueError: Mix of known and unknown variances...` #685

Closed ananiask8 closed 3 years ago

ananiask8 commented 3 years ago

Good day. I am new to this library, and Bayesian Optimization in general. Have been trying to get my head around your documentation, and I've been working with some simple examples in order to understand better.

It seems to me that the Loop API is missing the proper implementation of parameter constraints. When I first tried to include some parameter constraints I simply saw the constraint was not being applied, and then I went into the source code and found out that the with_evaluation_function function, called from optimize ignores the parameter constraints. See source.

My current experiment builds on top of your provided PyTorch tutorial, since I am doing it to explore the functionality available.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from typing import Dict, List, Optional, Tuple
from ax import optimize
from ax.utils.tutorials.cnn_utils import load_mnist, evaluate, CNN
from ax.service.ax_client import AxClient

def train(
        net: nn.Module, train_loader: DataLoader, parameters: Dict[str, float],
        dtype: torch.dtype, device: torch.device
) -> nn.Module:
    # Initialize network
    net.to(dtype=dtype, device=device)  # pyre-ignore [28]
    net.train()
    # Define loss and optimizer
    criterion = nn.NLLLoss(reduction="sum")
    optimizer = optim.SGD(
        net.parameters(),
        lr=parameters.get("lr", 0.001),
        momentum=parameters.get("base_momentum", 0.0),
        weight_decay=parameters.get("weight_decay", 0.0),
    )
    num_epochs = parameters.get("num_epochs", 1)
    scheduler = optim.lr_scheduler.OneCycleLR(
        optimizer,
        max_lr=parameters.get("lr", 0.001),
        base_momentum=parameters.get("base_momentum", 0.85),
        max_momentum=parameters.get("max_momentum", 0.95),
        total_steps=num_epochs*len(train_loader),
        cycle_momentum=True
    )

    # Train Network
    for _ in range(num_epochs):
        for inputs, labels in train_loader:
            # move data to proper dtype and device
            inputs = inputs.to(dtype=dtype, device=device)
            labels = labels.to(device=device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            scheduler.step()
    return net

def train_evaluate(parameterization):
    print(parameterization)
    net = CNN()
    net = train(net=net, train_loader=train_loader, parameters=parameterization, dtype=dtype, device=device)
    return evaluate(
        net=net,
        data_loader=valid_loader,
        dtype=dtype,
        device=device,
    )

if __name__ == '__main__':
    torch.manual_seed(12345)
    dtype = torch.float
    device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

    BATCH_SIZE = 512
    train_loader, valid_loader, test_loader = load_mnist(batch_size=BATCH_SIZE)

    best_parameters, best_values, experiment, model = optimize(
        parameters=[
            {"name": "lr", "type": "range", "bounds": [1e-6, 0.5], "log_scale": True},
            {"name": "base_momentum", "type": "range", "bounds": [0.0, 1.0]},
            {"name": "max_momentum", "type": "range", "bounds": [0.0, 1.0]},
        ],
        parameter_constraints=["base_momentum <= max_momentum"],
        objective_name='accuracy',
        evaluation_function=train_evaluate
    )
    print(best_parameters)
    print(best_values)

If I replace the Loop API with the Service API, the parameter constraint seems to work correctly.

ax_client = AxClient()
ax_client.create_experiment(
    parameters=[
        {"name": "lr", "type": "range", "bounds": [1e-6, 0.5], "log_scale": True},
        {"name": "base_momentum", "type": "range", "bounds": [0.0, 1.0]},
        {"name": "max_momentum", "type": "range", "bounds": [0.0, 1.0]},
    ],
    parameter_constraints=["base_momentum <= max_momentum"],
    objective_name='accuracy',
)

for i in range(10):
    parameters, trial_index = ax_client.get_next_trial()
    ax_client.complete_trial(trial_index=trial_index, raw_data=train_evaluate(parameters))

best_parameters, values = ax_client.get_best_parameters()
print(best_parameters)
print(values)

However, then I get this error which I am not aware how to prevent.

ValueError: Mix of known and unknown variances indicates valuation function errors. Variances should all be specified, or none should be.
lena-kashtelyan commented 3 years ago

Hi @ananiask8, thank you for reporting this! We have a master issue for problems with the Loop API: https://github.com/facebook/Ax/issues/605, so I'll reference this there.

In the meantime, using the Service API should work for you. Let me try and reproduce the issue you are running into and get back to you!

ananiask8 commented 3 years ago

@lena-kashtelyan Thanks. I would like to ask you if the way I am using the Service API for hyper-parameter optimization is correct in its most basic form. I ran it for 100 trials, and the parameters seemed to be repeating after around the trial number 50. Useful additional information is that during that optimization, my search space was made of four variables: weight_decay, lr, base_momentum and max_momentum. For two variables I would use something like 20 trials, but since I have four variables here, I increased it to 100 trials. Do you have any additional advice for high-dimensional search spaces?

lena-kashtelyan commented 3 years ago

@ananiask8, for four parameters 20-30 trials should still be plenty! If you get repeated trials, your optimization most likely converged. For very high dimensional search spaces (over 20 parameters), you could use SAASBO as regular Bayesian optimization will start to struggle; see more detail here: https://research.fb.com/blog/2021/07/high-dimensional-bayesian-optimization-with-sparsity-inducing-priors/ (tutorials to be added to the website soon too).

cc @Balandat, @dme65

Will also get back to you regarding the "mix of known and unknown variances" error soon!

dme65 commented 3 years ago

You can find a SAASBO tutorial here in the meantime if you are interested: https://github.com/facebook/Ax/blob/main/tutorials/saasbo.ipynb

lena-kashtelyan commented 3 years ago

@ananiask8, for some reason I couldn't reproduce the issue you were running into with the Service API, the code snippet you gave run without error for me. Could you confirm that the following code runs into the "mix of known and unknown variances" error for you still (and which versions of Ax and BoTorch you have)?

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from typing import Dict, List, Optional, Tuple
from ax import optimize
from ax.utils.tutorials.cnn_utils import load_mnist, evaluate, CNN
from ax.service.ax_client import AxClient

def train(
        net: nn.Module, train_loader: DataLoader, parameters: Dict[str, float],
        dtype: torch.dtype, device: torch.device
) -> nn.Module:
    # Initialize network
    net.to(dtype=dtype, device=device)  # pyre-ignore [28]
    net.train()
    # Define loss and optimizer
    criterion = nn.NLLLoss(reduction="sum")
    optimizer = optim.SGD(
        net.parameters(),
        lr=parameters.get("lr", 0.001),
        momentum=parameters.get("base_momentum", 0.0),
        weight_decay=parameters.get("weight_decay", 0.0),
    )
    num_epochs = parameters.get("num_epochs", 1)
    scheduler = optim.lr_scheduler.OneCycleLR(
        optimizer,
        max_lr=parameters.get("lr", 0.001),
        base_momentum=parameters.get("base_momentum", 0.85),
        max_momentum=parameters.get("max_momentum", 0.95),
        total_steps=num_epochs*len(train_loader),
        cycle_momentum=True
    )

    # Train Network
    for _ in range(num_epochs):
        for inputs, labels in train_loader:
            # move data to proper dtype and device
            inputs = inputs.to(dtype=dtype, device=device)
            labels = labels.to(device=device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            scheduler.step()
    return net

def train_evaluate(parameterization):
    print(parameterization)
    net = CNN()
    net = train(net=net, train_loader=train_loader, parameters=parameterization, dtype=dtype, device=device)
    return evaluate(
        net=net,
        data_loader=valid_loader,
        dtype=dtype,
        device=device,
    )

torch.manual_seed(12345)
dtype = torch.float
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

BATCH_SIZE = 512
train_loader, valid_loader, test_loader = load_mnist(batch_size=BATCH_SIZE)

ax_client = AxClient()
ax_client.create_experiment(
    parameters=[
        {"name": "lr", "type": "range", "bounds": [1e-6, 0.5], "log_scale": True},
        {"name": "base_momentum", "type": "range", "bounds": [0.0, 1.0]},
        {"name": "max_momentum", "type": "range", "bounds": [0.0, 1.0]},
    ],
    parameter_constraints=["base_momentum <= max_momentum"],
    objective_name='accuracy',
)

for i in range(10):
    parameters, trial_index = ax_client.get_next_trial()
    ax_client.complete_trial(trial_index=trial_index, raw_data=train_evaluate(parameters))

best_parameters, values = ax_client.get_best_parameters()
print(best_parameters)
print(values)
lena-kashtelyan commented 3 years ago

Closing this as I couldn't repro the issue; @ananiask8, feel free to reopen if it does repro for you with the code in my last comment, and we can look into this more!