facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.36k stars 306 forks source link

Nested / referenced parameters in search space #1454

Closed pezosanta closed 1 year ago

pezosanta commented 1 year ago

Hi,

We are trying to use Ax for a Neural Architecture Search task where we want the Hyperparameter Tuning algorithm to suggest us how many layers (of each layer type) should the DNN contain and what should be their kernel sizes. Currently, we are using the Hyperopt (TPE) library/tool for this task, but we want to try other hyperparameter tuning tools as well.

In Hyperopt, we can build this "complicated" parameter space with nested parameters in the following way:

from hyperopt import hp

...

param_space = {
    "conv_layers": 
        hp.choice(
            "num_conv_layers",
            [
                {
                    "kernel_sizes": [
                        hp.choice(f"kernel_{num_conv_layers}_{i}", [1, 5, 10, 20, 50]) for i in range(num_conv_layers)
                    ],
                } for num_conv_layers in [5, 10, 15, 20, 25]
            ]
        )
    ...
} 

...

Here, the list [5, 1, 10, 20, 1] can be a sampled value for the conv_layers named hyperparameter which would mean that we will have all in all 5 consecutive conv layers (len(conv_layers)) with 5, 1, 10, 20, 1 kernel sizes respectively.

Any idea on how to implement such parameter space structure/logic in Ax ? Based on the docs I am not sure whether nesting parameters (just like above) or referencing other parameters in the same parameter space is possible at all. (E.g.: param1 decides the number of layers and after it is sampled, param1 number of kernel sizes are sampled as well.)

Another idea of mine (see below) is to

param_space = [ { "name": "num_conv_layers_to_use", "type": "choice", "values": [5, 10, 15, 20, max_conv_layers_to_use] },

{
    "name": f"kernel_size_of_layer_{i+1}",
    "type": "choice",
    "values": [1, 5, 10, 20, 50]
}
for i in range(max_conv_layers_to_use)

]


However, this way, I don't know how confusing would it be for Ax Search that some sampled kernel sizes most often would not contribute to the reported metric. 

Note that we are using Ray Tune for hyperparameter tuning the models at scale on our GPU cluster, so the solution (if any) should be compatible with it.
mpolson64 commented 1 year ago

Hi Peter -- yes we do support search spaces like this and you can see an example as well as substantial discussion on the topic in this issue here https://github.com/facebook/Ax/issues/140#issuecomment-1009015750 . Methods for optimizing hierarchical search spaces like the one you describe is an active area of research on our team and we hope to provide some more substantial support (along with a blog post and tutorial) in the future. Let me know if you have any questions.

pezosanta commented 1 year ago

Hi @mpolson64, I try to address the hierarchical parameter space problem based on the issue/comments you've just referenced. What I have done so far is to build the parameter space that I explained in the post above in the following way:

...

self.max_conv_layers_to_use = 25
self.num_conv_layers_to_use = [5, 10, 15, 20, self.max_conv_layers_to_use]

self.param_space_1 = [
    {
        "name": "num_conv_layers_to_use",
        "type": "choice",
        "values": self.num_conv_layers_to_use,
        "dependents": {
            num_layers: [
                f"kernel_size_of_layer_{i+1}" for i in range(num_layers)
            ] for num_layers in self.num_conv_layers_to_use
        },
    },
]
self.param_space_2 = [
    {
        "name": f"kernel_size_of_layer_{i+1}",
        "type": "choice",
        "values": [1, 5, 10, 20, 50]
    } for i in range(self.max_conv_layers_to_use)
]
self.param_space = self.param_space_1 + self.param_space_2

...

This way, in the dependents section of self.param_space_1, the following dictionary is generated:

[{'name': 'num_conv_layers_to_use',
  'type': 'choice',
  'values': [5, 10, 15, 20, 25],
  'dependents': {
     5: ['kernel_size_of_layer_1',
      'kernel_size_of_layer_2',
      'kernel_size_of_layer_3',
      'kernel_size_of_layer_4',
      'kernel_size_of_layer_5'],
     10: ['kernel_size_of_layer_1',
      'kernel_size_of_layer_2',
      'kernel_size_of_layer_3',
      'kernel_size_of_layer_4',
      'kernel_size_of_layer_5',
      'kernel_size_of_layer_6',
      'kernel_size_of_layer_7',
      'kernel_size_of_layer_8',
      'kernel_size_of_layer_9',
      'kernel_size_of_layer_10'],

    ...
    }
}]

And for self.param_space_2, the following list of dictionaries is generated:

[{'name': 'kernel_size_of_layer_1',
  'type': 'choice',
  'values': [1, 5, 10, 20, 50]},
 {'name': 'kernel_size_of_layer_2',
  'type': 'choice',
  'values': [1, 5, 10, 20, 50]},
 {'name': 'kernel_size_of_layer_3',
  'type': 'choice',
  'values': [1, 5, 10, 20, 50]},
 {'name': 'kernel_size_of_layer_4',
  'type': 'choice',
  'values': [1, 5, 10, 20, 50]},
 {'name': 'kernel_size_of_layer_5',
  'type': 'choice',
  'values': [1, 5, 10, 20, 50]},

  ...
]

When I try to use this param space, Ax throws the following error, indicating that the sets in dependents values must be disjoint:

 File "/home/solecall/Desktop/balint/pytorch/virtualenv/lib/python3.8/site-packages/ax/core/search_space.py", line 1005, in _disjoint_union
    raise UserInputError(
ax.exceptions.core.UserInputError: Two subtrees in the search space contain the same parameters: {'kernel_size_of_layer_1'}.

If I try the example in your referenced issue/comments with putting the "learning_rate" parameter into the "XGBoost" values as well, the same error occures:

self.param_space = [
    {
    "name": "model",
    "type": "choice",
    "values": ["Linear", "XGBoost"],
    "dependents": {
        "Linear": ["learning_rate", "l2_reg_weight"],
        "XGBoost": ["learning_rate", "num_boost_rounds"], # this list contains "learning_rate" as well 
        },
    },
    {
        "name": "learning_rate",
        "type": "range",
        "bounds": [0.001, 0.1],
        "log_scale": True,
    },
    {
        "name": "l2_reg_weight",
        "type": "range",
        "bounds": [0.00001, 0.001],
    },
    {
        "name": "num_boost_rounds",
        "type": "range",
        "bounds": [0, 15],
    },
]
  File "/home/solecall/Desktop/balint/pytorch/virtualenv/lib/python3.8/site-packages/ax/core/search_space.py", line 1005, in _disjoint_union
    raise UserInputError(
ax.exceptions.core.UserInputError: Two subtrees in the search space contain the same parameters: {'learning_rate'}.

Is this behaviour intended or is this just a bug? In either case, what solution would you recommend?

Versions:

mpolson64 commented 1 year ago

Sorry this has taken me a while to respond and that this process has been a bit counterintuitive. The error you're seeing is intended behavior however; no parameter in a hierarchical search space can be a dependent on more than one parameter (ie your parameters should form a tree, not a general graph). This becomes an issue in your setup because kernel_size_of_layer_1 could be used in networks with any number of layers. However, the optimal kernel size for the first layer may be different in a network with 5 conv layers versus one with 10, 15, etc. For Ax, this means they should be modeled by different parameters. If instead we have kernel_size_of_layer_{i}_of_{j} we get a perfect tree structure and Ax will model and generate the appropriate candidates. The following snippet will create the correct search space for the values you described above:

num_conv_layers_to_use = [5, 10, 15, 20, 25]

params = [
    {
        "name": "num_conv_layers_to_use",
        "type": "choice",
        "is_ordered": True, # NOTE: I've also added this flag -- ordered choice params are much easier to handle on the modeling side
        "values": num_conv_layers_to_use,
        "dependents": {
            num_layers: [f"kernel_size_of_layer_{i + 1}_of_{num_layers}" for i in range(num_layers)] for num_layers in num_conv_layers_to_use
        }
    },
    *[
        {
            "name": f"kernel_size_of_layer_{i + 1}_of_{j}",
            "type": "choice",
            "is_ordered": True, # NOTE: I've added this flag again
            "values": [1, 5, 10, 20, 50]
        } for j in num_conv_layers_to_use for i in range(j) 
    ]
]

Which yields:

[{'name': 'num_conv_layers_to_use',
  'type': 'choice',
  'is_ordered': True,
  'values': [5, 10, 15, 20, 25],
  'dependents': {5: ['kernel_size_of_layer_1_of_5',
    'kernel_size_of_layer_2_of_5',
    'kernel_size_of_layer_3_of_5',
    'kernel_size_of_layer_4_of_5',
    'kernel_size_of_layer_5_of_5'],
   10: ['kernel_size_of_layer_1_of_10',
    'kernel_size_of_layer_2_of_10',
    'kernel_size_of_layer_3_of_10',
    'kernel_size_of_layer_4_of_10',
    'kernel_size_of_layer_5_of_10',
    ...}}]

I will note that doing this blows up the dimensionality (76 in this example!) so I recommend using a high-dimensional method like SAASBO which will be slower to generate candidates but have massively better candidate quality in high dimensional search spaces. You can initialize an AxClient with SAASBO strategy like this:

ax_client = AxClient.create_experiment(
    choose_generation_strategy_kwargs={"use_saasbo": True}
)

Let me know if this was helpful and if you have any follow up questions.

lena-kashtelyan commented 1 year ago

I believe this is resolved, so closing it. If you still need help with this, please reopen the issue, @pezosanta!