facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.63k stars 621 forks source link

[Bug] Defaults list interpolation in an inner config failing to find keys from outer config #1899

Open calvinhirsch opened 2 years ago

calvinhirsch commented 2 years ago

🐛 Bug

Description

When doing defaults list interpolation in an inner config (nested folder) using an interpolation key from an the outermost config (base level config) it cannot find the interpolation key. However, this is not what's causing the error because I do this multiple times in my code and only in one place does this happen.

Checklist

To reproduce

Minimal Code/Config snippet to reproduce This code is not quite a minimal reproduction but I'm not sure how to reproduce it, so I've just stripped everything else out of my code. I've attached a version with only the parts required to cause the error (minimal.zip) and a version with slightly more to put it in better context (more.zip).

Key parts

Main config:

@dataclass
class Config:
    dataset: DatasetConf = MISSING
    model: ModelConf = MISSING

cs = ConfigStore.instance()
cs.store(group="dataset", name="folder", node=FolderDatasetConf)
cs.store(group="dataset", name="pytorch", node=PyTorchDatasetConf)
cs.store(group="model", name="base_model", node=ModelConf)
cs.store(group="model/arch", name="base_dcrn", node=DCRNArchConf)
cs.store(group="model/arch", name="base_vanilla", node=VanillaArchConf)
cs.store(name="base_conf", node=Config)

DCRNArchConf has a field _dcrn_dbo_dataset_imgsize: Any = MISSING which means DCRN architecture defaults based on dataset and image size. This is the file model/arch/dcrn.yaml:

defaults:
  - base_dcrn
  - _dcrn_dbo_dataset_imgsize: ${dataset}_${dataset/img_size}  # This is the line that causes the error

I use defaults list interpolation to specialize the defaults using dataset and dataset/img_size, which are in the parent config. Then, there is a directory model/arch/_dcrn_dbo_dataset_imgsize/ that contains all valid dataset+img_size combinations for that architecture such as celeba_64.yaml.

When I run python3 config.py +dataset=celeba +model/arch=dcrn, I get the error message. Commenting out the line I highlighted allows the program to run but (obviously) then does not use the defaults in _dcrn_dbo_dataset_imgsize.

Stack trace/error message

Error resolving interpolation '${dataset}_${dataset/img_size}', possible interpolation keys: model/arch

Expected Behavior

The interpolation would have access to dataset and dataset/img_size because interpolation paths are absolute and would fill those in. The value _dcrn_dbo_dataset_imgsize would be assigned to the specialized yaml file associated with the combination.

System information

Additional context

I use this method in many other places throughout my code and this is the only one that seems to fail. In more.zip, see conf/_dbo_dataset, mode/_dbo_dataset, and model/cond/arch/_acgan_dbo_modelarch. These seem like they should work the same but they all seem to work fine.

Files

minimal.zip more.zip

calvinhirsch commented 2 years ago

I have found a workaround that also seems to have weird behavior. I can put the following in the defaults list in my main config.yaml in order to get the config file from there:

  # temp workaround
  - model/arch: ${model/arch}
  - model/arch/_${model/arch}_dbo_dataset_imgsize/${dataset}_${dataset/img_size}

I would think that only the second line is required, but it doesn't seem to register that model/arch has schema base_dcrn (i.e. has not taken model/arch/dcrn.yaml into account) when I run with +model/arch=dcrn because I get error Key 'g_channels' not in 'ModelArchConf' (g_channels is in base_dcrn, which is DCRNArchConf which is a child of ModelArchConf). Adding the first line seems to have fixed that, but this doesn't seem like intended behavior.

Jasha10 commented 2 years ago

Hi @calvinhirsch, Thank you for opening this issue!

First let me note that Hydra implements a special type of interpolation for its defaults lists, different from the interpolation that is used in ordinary OmegaConf objects.

Regarding your workaround:

I believe that the weirdness in your second post is related to the sensitivity of Hydra's defaults lists to the order of the elements of the list.

Hydra's defaults tree is composed in a depth-first manner. ~My hunch is that, if you use the CLI override +model/arch=dcrn, then that override is appended at the end of the top-level defaults list, and so (in terms of depth-first defaults tree composition) the model/arch=drcn branch of the tree is created last.~ This means that interpolations appearing earlier in the defaults list should be unable to make reference to the model/arch branch of the defaults tree, lest they should run into the issue you described with Key 'g_channels' not in 'ModelArchConf'. I suspect that your proposed workaround is circumventing this order issue by forcing the model/arch branch of the tree to be created earlier. To investigate whether this is the case, I tried reversing the order of the defaults from your workaround:

  # temp workaround
  - model/arch/_${model/arch}_dbo_dataset_imgsize/${dataset}_${dataset/img_size}
  - model/arch: ${model/arch}

Indeed, if the order is reversed as above, the same error occurs as before: Key 'g_channels' not in 'ModelArchConf', supporting my theory that (if the model/arch: ${model/arch} line of the workaround is omitted) the overridden branch model/arch of the defaults tree has not yet been created at the time when the interpolation is to be resolved.

So, in summary: If my understanding is correct, there is undefined behavior when you interpolate to an element that only appears later in the defaults list. Perhaps this fact could be better documented, or (better yet) an error could be raised if one tries to interpolate to something that has not yet appeared in the defaults list.

Edit: Strikethrough one of the sentences above r.e. the order of the +model/arch=dcrn branch in the defaults tree composition; actual behavior may be more subtle that what was written above.

Regarding your original post:

If I recall correctly, this limitation (that defaults lists in configs belonging to a child group cannot interpolate to choices that are outside of the child group) has the benefit of keeping Hydra's internals simple. In theory, it should be possible to implement the extended behavior. The workaround is to put the interpolation higher up in the defaults tree, as you have done in your second post above :)

calvinhirsch commented 2 years ago

@Jasha10 Thank you for the detailed reply. I had a feeling this was undefined behavior because it seems to work most of the places I've done this. The only remaining issue is that if, in my example, the DCRN model relied on different fields than the vanilla model, the interpolation would have to be different and you couldn't have one line for all model/arch cases like I've done in my workaround. In fact, I have this scenario in other places in my project (and a scenario where two schemas for the same group have different subgroups, preventing a way of setting defaults for all variants purely from the base config.yaml). I suspect there is a way to do this that forces the ordering to be correct, so I will update if I figure it out.