facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.31k stars 294 forks source link

Hierarchical Search Spaces with Multiple Independent Search Spaces #2539

Open Abrikosoff opened 6 days ago

Abrikosoff commented 6 days ago

Hi Ax Team,

First of all thanks for all your help with my (8 and counting) questions so far! I now have another one :( I have two use (potential) use cases for hierarchical search spaces at the moment:

  1. A NAS application, where in this case I have a parameter search space definition of the form:

    
    def generate_parameters():
    parameters = [
        {
            "name": "lr",
            "type": "range",
            "bounds": [1e-6, 0.4],
            "value_type": "float",
            "log_scale": True,
        },
        {
            "name": "momentum",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ]
    params = [
        {
            "name": "num_conv_layers_to_use",
            "type": "choice",
            "is_ordered": True,
            "values": num_conv_layers_to_use,
            "dependents": {
                **{num_layers: [f"kernel_size_of_layer_{i + 1}_of_{num_layers}" for i in range(num_layers)] 
                    for num_layers in num_conv_layers_to_use},
        },
            },
        *[
            {
                "name": f"kernel_size_of_layer_{i + 1}_of_{j}",
                "type": "choice",
                "is_ordered": True, 
                "values": [1, 5, 10, 20, 50]
            } for j in num_conv_layers_to_use for i in range(j)
        ],
        *[    
            {
                "name": f"activation_{i}_of_layer_num_{j}",
                "type": "choice",
                "values": ["ReLU", "Tanh", "LeakyReLU"],  # Specify the possible activation functions for each layer
            } for j in num_conv_layers_to_use for i in range(j)
        ],
        *[
        {
                "name": f"dropout_{i}_of_layer_num_{j}",
                "type": "range",
                "bounds": [0.0, 1.0],  # Specify the possible dropout functions for each layer
            } for j in num_conv_layers_to_use for i in range(j)
        ],
    ]
    
    parameters.extend(params)
    
    return parameters

ax_client.create_experiment( name="tune_cnn_on_mnist",

parameters=generate_parameters(num_layers=num_layers),

parameters=generate_parameters(),
objectives={"MSE": ObjectiveProperties(minimize=True)},
choose_generation_strategy_kwargs={"use_saasbo": True},

)

where here the `num_conv_layers_to_use` has been set as the root node. Without the activation and dropout definitions this would have worked, but since a MLP should have those elements as well those should be included. But doing this raises the error 
`NotImplementedError: Could not find the root parameter; found dependent parameters {'kernel_size_of_layer_2_of_3', 'kernel_size_of_layer_1_of_4', ....}. Having multiple independent parameters is not yet supported.`
which I take to mean that the search space must be a complete tree and not consist of separated subspaces.

2. The other use case is an idea to do discrete multifidelity BO in the Service API (related to [this](https://github.com/facebook/Ax/issues/2475#issuecomment-2146658171) and [this](https://github.com/facebook/Ax/issues/2514#issue-2348256680)). This consists of defining the search space as follows:

{ { "name": "x1", "type": "range", "bounds": [0.0, 1.0], "value_type": "float", # Optional, defaults to inference from type of "bounds". "log_scale": False, # Optional, defaults to False. }, { "name": "x2", "type": "range", "bounds": [0.0, 1.0], }, { "name": "x3", "type": "range", "bounds": [0.0, 1.0], }, { "name": "fidelity_marker", "type": "choice", "values": ["low", "medium", "high", "max"], "dependents": {"low": ["low_fidelity"], "medium": ["medium_fidelity"], "high": ["high_fidelity"], "max": ["max_fidelity"]}, }, { "name": "low_fidelity", "type": "fixed", "value": 0.0, "is_fidelity": True, "target_value": 1.0,
}, { "name": "medium_fidelity", "type": "fixed", "value": 0.5, "is_fidelity": True, "target_value": 1.0,
}, { "name": "high_fidelity", "type": "fixed", "value": 0.75, "is_fidelity": True, "target_value": 1.0,
}, { "name": "max_fidelity", "type": "fixed", "value": 1.0, "is_fidelity": True, "target_value": 1.0, } ],


where `fidelity_marker` is used like a boolean flag to modify the fidelity values. But this also throws me the same error as above.

So my question boils down to: is there actually no support right now for multiple search spaces? And if that's the case, are there any workarounds for such a use case? Because it seems to me that this kind of scenario would appear much more frequently than the case where one full search space tree can be defined for the complete problem.
danielcohenlive commented 5 days ago

Hi @Abrikosoff , thanks for your question. Are you sure you should be using a hierarchical search space? It seems like you shouldn't need a separate dropout per depth. If so this setup could be simplified. cc @esantorella

Abrikosoff commented 4 days ago

Hi Daniel, thanks for your remarks! I was able to construct the HSS for the NAS use case in the following way:

parameters = [
    {
        "name": "num_layers",
        "type": "choice",
        "values": ["two_layers", "three_layers", "four_layers", "five_layers",],  # Specify the range of num_layers
        "dependents": {
            "two_layers": ["2_layer_hidden_size_0", 
                            "2_layer_hidden_size_1", 
                            "2_layer_activation_0", 
                            "2_layer_activation_1", 
                            "2_layer_dropout_0", 
                            "2_layer_dropout_1", 
                            "2_layer_lr", 
                            "2_layer_momentum",],
            "three_layers": ["3_layer_hidden_size_0", 
                                "3_layer_hidden_size_1", 
                                "3_layer_hidden_size_2", 
                                "3_layer_activation_0", 
                                "3_layer_activation_1", 
                                "3_layer_activation_2", 
                                "3_layer_dropout_0", 
                                "3_layer_dropout_1", 
                                "3_layer_dropout_2",
                                "3_layer_lr", 
                                "3_layer_momentum",],
            "four_layers": ["4_layer_hidden_size_0", 
                            "4_layer_hidden_size_1", 
                            "4_layer_hidden_size_2", 
                            "4_layer_hidden_size_3", 
                            "4_layer_activation_0", 
                            "4_layer_activation_1", 
                            "4_layer_activation_2", 
                            "4_layer_activation_3", 
                            "4_layer_dropout_0", 
                            "4_layer_dropout_1", 
                            "4_layer_dropout_2", 
                            "4_layer_dropout_3",
                            "4_layer_lr", 
                            "4_layer_momentum",],
            "five_layers": ["5_layer_hidden_size_0", 
                            "5_layer_hidden_size_1", 
                            "5_layer_hidden_size_2", 
                            "5_layer_hidden_size_3", 
                            "5_layer_hidden_size_4", 
                            "5_layer_activation_0", 
                            "5_layer_activation_1", 
                            "5_layer_activation_2", 
                            "5_layer_activation_3", 
                            "5_layer_activation_4", 
                            "5_layer_dropout_0", 
                            "5_layer_dropout_1", 
                            "5_layer_dropout_2", 
                            "5_layer_dropout_3", 
                            "5_layer_dropout_4",
                            "5_layer_lr", 
                            "5_layer_momentum",],
        },
    },

which I think keeps the spirit of the original question, but you are right, dropout could just be fixed; my original naive intention was to have a separate dropout for each hidden layer. I'll see if @esantorella has other comments about this question (especially about the discrete fidelity); if not I'll close it. Again, thanks a lot!