facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.38k stars 311 forks source link

Rejection Sampling Error Due to Search Space Complexity in Hardware-Aware NAS #2510

Closed ugobenazra closed 4 months ago

ugobenazra commented 5 months ago

Hello, I got some issues with my search space. I am using Ax to perform NAS on hyper parameters but more importantly on the neural network architecture. I am using the Developper API from Ax to make it choose, according to SOBOL and then BO, every layer of my hidden blocks, the output dimensions of each block., layer's parameters and so on

I also added some parameter constraints on output dimensions and expand_ratio.

As soon as I added the parameter constraints on the expand_ratio, the script crashed and wrote me :

Scheduler: Optimization complete: Rejection sampling error (specified maximum draws (10000) exhausted, without finding sufficiently many (1) candidates). This likely means that there are no new points left in the search space.

Seeing this, I started to think that my constraints were not working well so I first removed them. Then I started to increase the number of hidden blocks that Ax will have to generate. As soon as I went over 8 blocks, the same error appeared. So from what I understand, the issue does not come from the constraints but from the search space that is maybe too big or too complex. The probability that my understanding of the package is not good and that I made a mistake is also high.

Here is the Ax script that I run for this task. I'm open for any comment, suggestion, or solution and I would be pleased to discuss with the community to better understand the logics behind. I didn't upload the whole code because it is quite big and I don't think it's a necessity but I'm open to this option if it is.

import tempfile
from dataclasses import dataclass
from datetime import datetime
from pathlib import Path

import hydra
from ax.core import (
    ChoiceParameter,
    Experiment,
    FixedParameter,
    MultiObjective,
    Objective,
    OrderConstraint,
    ParameterType,
    RangeParameter,
    SearchSpace,
)
from ax.core.optimization_config import MultiObjectiveOptimizationConfig
from ax.metrics.tensorboard import TensorboardCurveMetric
from ax.modelbridge.dispatch_utils import choose_generation_strategy
from ax.runners.torchx import TorchXRunner
from ax.service.scheduler import Scheduler, SchedulerOptions
from ax.service.utils.report_utils import exp_to_df
from omegaconf import DictConfig
from torchx import specs
from torchx.components import utils

from nas_exp.utils import find_index_by_name, get_nested

class MyTensorboardMetric(TensorboardCurveMetric):
        ...
        return False

@dataclass
class BlockParameters:
    hidden_layers: list[str]
    strides: list[int]
    kernel_sizes: list[int]
    # dropouts: list[float]
    residual_activation: bool
    residual_layers: list[str]
    strides_residual: list[int]
    kernel_sizes_residual: list[int]
    dropouts_residual: list[float]
    tensor_out_dim: int
    tensor_expand_ratio: int

def trainer(
    log_path: str = "logs_nas",
    experiment_name: str = "torchx_imunet_nas",
    epochs: int = 1,
    batch_size_train: int = 32,
    step_size: int = 10,
    lr: float = 0.0001,
    window_size: int = 200,
    activation_function: str = "elu",
    output_block: str = "globavgoutputmodule",
    trial_idx: int = -1,
    **kwargs,
) -> specs.AppDef:
    blocks = []
    nbr_hidden_blocks = max(
        int(key.split("_")[2]) + 1
        for key in kwargs.keys()
        if key.startswith("hidden_block")
    )
    depth_hidden_block = max(
        int(key.split("_")[4]) + 1
        for key in kwargs.keys()
        if key.startswith("hidden_block")
    )
    depth_residuals = max(
        int(key.split("_")[4]) + 1
        for key in kwargs.keys()
        if key.startswith("residual_block")
    )

    for i in range(nbr_hidden_blocks):
        hidden_layers = [
            kwargs[f"hidden_block_{i}_layer_{j}"] for j in range(depth_hidden_block)
        ]
        strides = [
            kwargs[f"stride_block_{i}_layer_{j}"] for j in range(depth_hidden_block)
        ]
        kernel_sizes = [
            kwargs[f"kernel_size_block_{i}_layer_{j}"]
            for j in range(depth_hidden_block)
        ]
        dropouts = [
            kwargs[f"dropout_block_{i}_layer_{j}"] for j in range(depth_hidden_block)
        ]
        residual_activation = kwargs[f"residual_activation_block_{i}"]
        residual_layers = [
            kwargs[f"residual_block_{i}_layer_{j}"] for j in range(depth_residuals)
        ]
        strides_residual = [
            kwargs[f"stride_residual_block_{i}_layer_{j}"]
            for j in range(depth_residuals)
        ]
        kernel_sizes_residual = [
            kwargs[f"kernel_residual_size_block_{i}_layer_{j}"]
            for j in range(depth_residuals)
        ]
        dropouts_residual = [
            kwargs[f"dropout_residual_block_{i}_layer_{j}"]
            for j in range(depth_residuals)
        ]
        tensor_out_dim = kwargs[f"tensor_out_dim_block_{i}"]
        tensor_expand_ratio = kwargs[f"tensor_expand_ratio_block_{i}"]

        block_params = BlockParameters(
            hidden_layers=hidden_layers,
            strides=strides,
            kernel_sizes=kernel_sizes,
            dropouts=dropouts,
            residual_activation=residual_activation,
            residual_layers=residual_layers,
            strides_residual=strides_residual,
            kernel_sizes_residual=kernel_sizes_residual,
            dropouts_residual=dropouts_residual,
            tensor_out_dim=tensor_out_dim,
            tensor_expand_ratio=tensor_expand_ratio,
        )
        blocks.append(block_params)

    architecture = {
        "activation_function": activation_function,
        "input_block": {"block": {}, "output_dim": 64},
        "residual_group": {
            i: {
                "block": {
                    j: {layer: []} for j, layer in enumerate(block.hidden_layers)
                },
                "output_dim": block.tensor_out_dim,
                "expand_ratio": block.tensor_expand_ratio,
                "strides": block.strides,
                "kernel_sizes": block.kernel_sizes,
                "dropouts": block.dropouts,
                "residual_activation": block.residual_activation,
                "residual": {
                    j: {layer: []} for j, layer in enumerate(block.residual_layers)
                },
                "strides_residual": block.strides_residual,
                "kernel_sizes_residual": block.kernel_sizes_residual,
                "dropouts_residual": block.dropouts_residual,
            }
            for i, block in enumerate(blocks)
        },
        "output_block": output_block,
    }

    if trial_idx >= 0:
        log_path = Path(log_path).joinpath(str(trial_idx)).absolute().as_posix()
    else:
        log_path = Path(log_path).joinpath("default").absolute().as_posix()
    run_name = str(trial_idx)
    return utils.python(
        "--experiment_name",
        experiment_name,
        "--run_name",
        run_name,
        "--log_path",
        str(log_path),
        "--epochs",
        str(epochs),
        "--batch_size_train",
        str(batch_size_train),
        "--step_size",
        str(step_size),
        "--lr",
        str(lr),
        "--window_size",
        str(window_size),
        "--architecture",
        str(architecture),
        name="trainer",
        script="compute_model_ax.py",
    )

@hydra.main(version_base="1.3", config_path="conf", config_name="nas")
def main(cfg: DictConfig):
    with tempfile.TemporaryDirectory() as tmp_dir:
        ax_runner = TorchXRunner(
            tracker_base=tmp_dir,
            component=trainer,
            # NOTE: To launch this job on a cluster instead of locally you can
            # specify a different scheduler and adjust arguments appropriately.
            scheduler="local_cwd",
            component_const_params={
                "log_path": get_nested(cfg, "paths.log_dir", default="logs_nas"),
                "epochs": get_nested(cfg, "nas_experiment.epochs", default=1),
            },
            cfg={},
        )

    parameters = [
        FixedParameter(
            name="experiment_name",
            parameter_type=ParameterType.STRING,
            value=get_nested(cfg, "nas_experiment.name", default="torchx_imunet_nas")
            + f"_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
        ),
        ChoiceParameter(
            name="batch_size_train",
            values=get_nested(
                cfg,
                "search_space.hyperparams_choices.batch_size_train",
                default=[16, 32, 64, 128],
            ),
            parameter_type=ParameterType.INT,
            is_ordered=True,
            sort_values=True,
        ),
        RangeParameter(
            name="step_size",
            parameter_type=ParameterType.INT,
            lower=get_nested(
                cfg, "search_space.hyperparams_choices.step_size.lower", default=5
            ),
            upper=get_nested(
                cfg, "search_space.hyperparams_choices.step_size.upper", default=50
            ),
        ),
        RangeParameter(
            name="lr",
            parameter_type=ParameterType.FLOAT,
            lower=get_nested(
                cfg, "search_space.hyperparams_choices.lr.lower", default=0.00001
            ),
            upper=get_nested(
                cfg, "search_space.hyperparams_choices.lr.upper", default=0.001
            ),
        ),
        ChoiceParameter(
            name="window_size",
            values=get_nested(
                cfg,
                "search_space.hyperparams_choices.window_size",
                default=[100, 200, 300, 400],
            ),
            parameter_type=ParameterType.INT,
            is_ordered=True,
            sort_values=True,
        ),
        ChoiceParameter(
            name="activation_function",
            values=get_nested(
                cfg,
                "search_space.architecture_choices.activation_function",
                default=["relu", "elu", "swish"],
            ),
            parameter_type=ParameterType.STRING,
            is_ordered=False,
            sort_values=False,
        ),
        ChoiceParameter(
            name="output_block",
            values=get_nested(
                cfg,
                "search_space.architecture_choices.output_blocks",
                default=["globavgoutputmodule", "fcoutputmodule"],
            ),
            parameter_type=ParameterType.STRING,
            is_ordered=False,
            sort_values=False,
        ),
        FixedParameter(
            name="output_block",
            parameter_type=ParameterType.STRING,
            value="fcoutputmodule",
        ),
    ]

    # here we tackle the content of each hidden block, part of the residual group
    # we fix the dimension of the hidden blocks and we will choose the layers one
    # by one among the ones we have at our disposal
    nbr_hidden_blocks = get_nested(
        cfg, "search_space.architecture_choices.nbr_hidden_blocks", default=4
    )
    depth_hidden_block = get_nested(
        cfg, "search_space.architecture_choices.depth_hidden_block", default=10
    )
    depth_residuals = get_nested(
        cfg, "search_space.architecture_choices.depth_residuals", default=2
    )

    # the following loop will create the parameters that depend on the number of hidden blocks inside the residual group
    for i in range(nbr_hidden_blocks):

        # layers of the hidden block
        for j in range(depth_hidden_block):
            parameters.append(
                ChoiceParameter(
                    name=f"hidden_block_{i}_layer_{j}",
                    values=get_nested(
                        cfg,
                        "search_space.architecture_choices.hidden_blocks",
                        default=[
                            "activation_function",
                            "identityblock",
                            "conv1d",
                            "batchnorm1d",
                            "dropout",
                        ],
                    ),
                    parameter_type=ParameterType.STRING,
                    is_ordered=False,
                    sort_values=False,
                )
            )

            # stride
            parameters.append(
                RangeParameter(
                    name=f"stride_block_{i}_layer_{j}",
                    parameter_type=ParameterType.INT,
                    lower=get_nested(
                        cfg, "search_space.architecture_choices.stride.lower", default=1
                    ),
                    upper=get_nested(
                        cfg, "search_space.architecture_choices.stride.upper", default=2
                    ),
                )
            )

            # kernel size
            parameters.append(
                RangeParameter(
                    name=f"kernel_size_block_{i}_layer_{j}",
                    parameter_type=ParameterType.INT,
                    lower=get_nested(
                        cfg,
                        "search_space.architecture_choices.kernel_size.lower",
                        default=1,
                    ),
                    upper=get_nested(
                        cfg,
                        "search_space.architecture_choices.kernel_size.upper",
                        default=3,
                    ),
                )
            )

        # whether the residual is activated or not
        parameters.append(
            ChoiceParameter(
                name=f"residual_activation_block_{i}",
                values=get_nested(
                    cfg,
                    "search_space.architecture_choices.residual_actvations",
                    default=[True, False],
                ),
                parameter_type=ParameterType.BOOL,
                is_ordered=False,
                sort_values=False,
            )
        )

            # dropout rate
            parameters.append(
                RangeParameter(
                    name=f"dropout_block_{i}_layer_{j}",
                    parameter_type=ParameterType.FLOAT,
                    lower=get_nested(
                        cfg,
                        "search_space.architecture_choices.dropout.lower",
                        default=0.01,
                    ),
                    upper=get_nested(
                        cfg,
                        "search_space.architecture_choices.dropout.upper",
                        default=1,
                    ),
                )
            )

        # layers of the residual
        for j in range(depth_residuals):
            parameters.append(
                ChoiceParameter(
                    name=f"residual_block_{i}_layer_{j}",
                    values=get_nested(
                        cfg,
                        "search_space.architecture_choices.residuals",
                        default=[
                            "activation_function",
                            "identityblock",
                            "conv1d",
                            "batchnorm1d",
                            "dropout",
                        ],
                    ),
                    parameter_type=ParameterType.STRING,
                    is_ordered=False,
                    sort_values=False,
                )
            )

            # stride_residual
            parameters.append(
                RangeParameter(
                    name=f"stride_residual_block_{i}_layer_{j}",
                    parameter_type=ParameterType.INT,
                    lower=get_nested(
                        cfg, "search_space.architecture_choices.stride.lower", default=1
                    ),
                    upper=get_nested(
                        cfg, "search_space.architecture_choices.stride.upper", default=2
                    ),
                )
            )

            # kernel_size_residual
            parameters.append(
                RangeParameter(
                    name=f"kernel_residual_size_block_{i}_layer_{j}",
                    parameter_type=ParameterType.INT,
                    lower=get_nested(
                        cfg,
                        "search_space.architecture_choices.kernel_size.lower",
                        default=1,
                    ),
                    upper=get_nested(
                        cfg,
                        "search_space.architecture_choices.kernel_size.upper",
                        default=3,
                    ),
                )
            )

            # dropout_rate_residual
            parameters.append(
                RangeParameter(
                    name=f"dropout_residual_block_{i}_layer_{j}",
                    parameter_type=ParameterType.FLOAT,
                    lower=get_nested(
                        cfg,
                        "search_space.architecture_choices.dropout.lower",
                        default=0.01,
                    ),
                    upper=get_nested(
                        cfg,
                        "search_space.architecture_choices.dropout.upper",
                        default=1,
                    ),
                )
            )

        # tensor's output dimension
        parameters.append(
            RangeParameter(
                name=f"tensor_out_dim_block_{i}",
                parameter_type=ParameterType.INT,
                lower=get_nested(
                    cfg,
                    "search_space.architecture_choices.tensor_output_dim.lower",
                    default=6,
                ),
                upper=get_nested(
                    cfg,
                    "search_space.architecture_choices.tensor_output_dim.upper",
                    default=512,
                ),
            )
        )

        # tensor's dimension expand ratio
        parameters.append(
            RangeParameter(
                name=f"tensor_expand_ratio_block_{i}",
                parameter_type=ParameterType.INT,
                lower=get_nested(
                    cfg,
                    "search_space.architecture_choices.tensor_expand_ratio.lower",
                    default=1,
                ),
                upper=get_nested(
                    cfg,
                    "search_space.architecture_choices.tensor_expand_ratio.upper",
                    default=3,
                ),
            )
        )

    parameter_constraints = []
    for i in range(nbr_hidden_blocks - 1):
        parameter_constraints.append(
            OrderConstraint(
                lower_parameter=parameters[
                    find_index_by_name(parameters, f"tensor_out_dim_block_{i}")
                ],
                upper_parameter=parameters[
                    find_index_by_name(parameters, f"tensor_out_dim_block_{i+1}")
                ],
            )
        )

        parameter_constraints.append(
            OrderConstraint(
                lower_parameter=parameters[
                    find_index_by_name(parameters, f"tensor_expand_ratio_block_{i}")
                ],
                upper_parameter=parameters[
                    find_index_by_name(parameters, f"tensor_expand_ratio_block_{i+1}")
                ],
            )
        )

    search_space = SearchSpace(
        parameters=parameters,
        parameter_constraints=parameter_constraints,
    )

    log_path = get_nested(cfg, "paths.log_dir", default="logs_nas")
    val_loss = MyTensorboardMetric(
        ....
    )
    metric1 = MyTensorboardMetric(
         ....
    )
    metric2 = MyTensorboardMetric(
         ....
    )
    metric3 = MyTensorboardMetric(
         ....
    )

    opt_config = MultiObjectiveOptimizationConfig(
        objective=MultiObjective(
            objectives=[
                Objective(metric=val_loss, minimize=True),
                Objective(metric=model_num_params, minimize=True),
                Objective(metric=ate, minimize=True),
                Objective(metric=rte, minimize=True),
            ],
        ),
        outcome_constraints=[],
    )

    experiment = Experiment(
        name=get_nested(cfg, "nas_experiment.name", default="torchx_imunet_nas")
        + f"_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
        search_space=search_space,
        optimization_config=opt_config,
        runner=ax_runner,
    )

    gs = choose_generation_strategy(
        search_space=experiment.search_space,
        optimization_config=experiment.optimization_config,
        num_trials=get_nested(cfg, "nas_experiment.total_trials", default=1),
    )

    scheduler = Scheduler(
        experiment=experiment,
        generation_strategy=gs,
        options=SchedulerOptions(
            total_trials=get_nested(cfg, "nas_experiment.total_trials", default=1),
            max_pending_trials=4,
            tolerated_trial_failure_rate=0.9999,
        ),
    )

    scheduler.run_all_trials()

if __name__ == "__main__":
    main()
mgrange1998 commented 5 months ago

Take a look at the "rejection_sample" method in "model_utils.py" where this error is being thrown. https://ax.dev/api/_modules/ax/models/model_utils.html

How it works is that points are generated with "gen_unconstrained", and filters out those which don't meet the parameter constraint. In your case, you add "2*(nbr_hidden_blocks-1)" order constraints on tensor_out_dimblock{i} and tensor_expand_ratioblock{i}. So once you increase the number up to 8 blocks, there are 14 parameter constraints which need to be satisfied for a unconstrained candidate to meet the parameter constraints.

Because the parameters are unconstrained, the chance of a single candidate meeting all 14 of these is (0.5)^14 = 0.000061. So for 10000 draws, the chance of none of them being a valid candidate is (1 - 0.000061)^10000 = 0.54. This means that at 8 blocks, you'll fail to find a valid candidate more than half the time.

For 7 blocks, the chance of a candidate meeting all 12 constraints is 0.0002, and the chance of all 10,000 draws failing to meet the constraints is (1 - 0.0002)^10000 = 0.087- this explains why you did not encounter this issue when using 7 or less blocks.

So your issue is coming from the parameter constraints scaling with the number of blocks. Let me know if this makes sense and if you have any other questions.

Balandat commented 5 months ago

We do have an option to fall back to a different sampler in such cases of search spaces that are highly constrained: https://www.internalfb.com/code/fbsource/[f4eb45809174569b78529c2bf29b56025610ecf9]/fbcode/ax/models/random/base.py?lines=63

This is not enabled by default (it probably should be though?), but you can manually construct a Sobol GenerationStep that does this by passing model_kwargs = {"fallback_to_sample_polytope": False} to the GenerationStep constructor. See https://github.com/facebook/Ax/issues/2373 for an example of this.

ugobenazra commented 4 months ago

Hello,

Thank you very much for your clear explanations. Now I have a better understanding of from where my issue came from and I'll find a way to deal with it.