facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.76k stars 629 forks source link

[Feature Request] Concatenate Multiple Config List elements #2027

Open monney opened 2 years ago

monney commented 2 years ago

🚀 Feature Request

Currently when loading multiple configs from a config group, a top-level list in the config files is not composed in a useful way.

Motivation

When you have a list of possible options that you want to construct, such as for example choosing multiple callbacks among a group of callbacks. Ideally you might want to construct the list by composing configs of list elements.

for example the following: config.yaml

defaults: 
- elements:  [foo, fizz]

with the elements directory having two files:

foo.yaml

- foo: bar

fizz.yaml

- fizz: buzz

This returns the composed config:

elements:
  - fizz: buzz

Pitch

I think a good thing to do here would instead to compose these configs as if they had literally been written in the same yaml file, which what is sort of what I would expect to happen if we load multiple configs from the same group.

In the proposed change, the returned config would instead be:

elements:
  - foo: bar
  - fizz: buzz

The current alternative is every subset of elements you could want, needs a different config, or you have to name each element, load the elements and then use interpolation to construct the list separately, this is quite clunky, and bloats the final config file.

Additional context

Hydra versions tested: 1.1.1 and main branch from github

I think this might have actually worked as proposed in a previous version of Hydra, because I have a few configs written with this structure which I remember using, but I can't seem to find which version had it working like this.

Jasha10 commented 2 years ago

Thanks for the feature request, @monney. See this comment for a workaround.

In the proposed change, the returned config would instead be:

I don't think we can implement your proposal as written because it would be a breaking change, but I hope that in the future we can introduce a new defaults list keyword (e.g. concatenate) to achieve what you suggest:

defaults: 
- concatenate elements:  [foo, fizz]

Edit: or some other mechanism for composing a list via defaults.

monney commented 2 years ago

@Jasha10 Thank you! That does seem like a reasonable proposal, however this feature appears like it produced an error until it was fixed in 1.1.1 (#1724). I wonder if this would still be considered a breaking change ? Specifically I'm asking when this bug was fixed, is it possible that this composition was not intended to occur like this in the first place and is thereby a bug?

Jasha10 commented 2 years ago

I believe that #1724 is actually a different issue. If my understanding is correct, that issue is referring to the case where the defaults list is used to load a yaml file that contains a top-level yaml sequence. Meanwhile, this issue is highlighting the case where a defaults list contains a key/value pair where the value is itself a list.

monney commented 2 years ago

@Jasha10 It is slightly different but, list behavior was broken by the same item which blocked the top level yaml sequence. I tried my minimal example on the older hydra versions and get the same error. You just couldn't load yaml files if the top most element was part of a list or sequence.

robogast commented 2 years ago

I would like to add my opinion to this issue:

I often have multiple configurations in the same folder, which are (usually) mutually exclusive, but not always, e.g.:

optim
  | - adam.yaml
  | - sgd.yaml

Where a single yaml is simply:

_target_: torch.optim.Adam

lr: 0.003
etc: ...

Where its makes sense for my root level config to select both elements, as a list:

defaults:
  - optim: [adam, sgd]

With the desired (equivalent) config being:

optim:
  - _target_: torch.optim.Adam
    lr: ...
  - _target_: torch.optim.SGD
    lr: ...

But this is hard to do using current default list semantics.

IMO using the [<config1>, <config2>] syntax for selecting multiple configurations is wrong and misleading. What you are actually doing is:

defaults:
  - optim: {adam, sgd}

(Also known as yaml set notation, which introduces some other particularities, but let's ignore these right now.)

I think the main issue is that dictionary and list syntax is being mixed by hydra: the defaults list is not a list, it is a dictionary; i.e. the output of:

defaults:
  - foo: bar

is not [{foo: bar}] but {foo: bar}. I do not immediately see why the defaults list should not be either a dictionary or list except for legacy reasons in the implementation, but foregoing this, I think at least hydra should stick to the appropriate yaml syntax.

Jasha10 commented 2 years ago

Thanks for the feedback, @robogast!

the defaults list is not a list, it is a dictionary ...

One important reason for having defaults be a sequence rather than a mapping is to preserve order. Yaml mappings are not guaranteed to preserve order, while yaml sequences are. Using a sequence allows for a guaranteed difference between defaults: [foo, bar] and defaults: [bar, foo].

You are also right about the legacy reasons.

Thanks for the tip about "yaml set notation". I didn't know about that.

>>> import yaml
>>> yaml.safe_load("{foo, bar}")
{'foo': None, 'bar': None}
>>> yaml.safe_load("{foo, bar: baz}")
{'foo': None, 'bar': 'baz'}
mariomeissner commented 2 years ago

I would also like to do this from the command line somehow!

For example: python train.py optim+=adam

To whatever the default optim.lr entries are, I want to add more (list concatenation), to achieve the same effect as described above by @robogast, but from the command line.

igoradamski2 commented 3 months ago

is anyone looking into this?