erodola / DLAI-s2-2021

Teaching material for the course of Deep Learning and Applied AI, 2nd semester 2021, Sapienza University of Rome
35 stars 5 forks source link

Hydra/OmegaConf: how to make and reference nested configuration files? #22

Open umbertov opened 3 years ago

umbertov commented 3 years ago

Hi, i'm just getting started with Hydra and OmegaConf, populating nn-template with my code, and this may seem a silly question, but i can't figure out how to make and use deeply nested configs, specifically how to reference them using variable interpolation.

For example, i'd like to place model- or dataset- specific configurations to a separate file, like in conf/data/dataset_conf.yaml, but i'm surely doing something wrong because when i try to interpolate ${data.dataset_conf.some_key} i get an exception about the dataset_conf not being a key in the data configuration node.

I also tried to make a file in conf/data/dataset_conf/default.yaml but got the same kind of errors.

So, to the best of my knowledge, there are only two sure ways to add my configuration to the project:

  1. Paste my configuration in one of the already existing YML files: conf/{data,model}/default.yaml and have slightly longer and less organized configuration files.
  2. Create a new directory and YAML file in conf, for example conf/dataset_specific/default.yaml, and add my configuration there. This has the different disadvantage of not expressing the hierarchy of config files, such as the dataset-specific options not being under the data group of configurations.

I ended up with option 1, placing the relevant configuration in the same conf/data/default.yaml file, under a new node dataset_conf, which now looks like this, just to be clear:

dataset_conf:
  param1: key1
  # ecc...

datamodule:
  _target_: src.pl_data.datamodule.MyDataModule
  # ecc...

But this gets messy very soon, considering how i aslo plan on tracking my custom data augmentation pipeline in the config files (a dozen nodes to be instantiate()'d), so how does one move all the stuff from dataset_conf to another file in the same conf/data/ folder (or a subfolder thereof), and how does it work to properly reference it from other files?

After some googling, I looked into Config Groups and Config Group Options, which have a promising name, but they don't seem exactly fit for the purpose i'm trying to achieve, maybe i got them wrong?

Sorry for the long post and thanks for any answer

lucmos commented 3 years ago

Hi!

To ease a bit the explanation I created a small repo with a minimum working example to gather the (few lines) of code. The main.py only instantiates the configuration, resolves the interpolations and pretty prints it.

Let me check if I got what you want to accomplish:

the goal is to neatly organize alternative, independent configurations for some parts of your hydra configuration. e.g. for different datasets or different models. Other parts of your configurations may depend on those.

I think there is a bit of personal preference involved, but I would go as follows.

Say that I have two different independent models: mlp and resnet. I would create two different .yaml files under conf/model/:

.
├── default.yaml
└── model
    ├── mlp.yaml
    └── resnet.yaml

For this example those files would contain only a name:

https://github.com/lucmos/hydra_example/blob/243b2bf695ac1adae5a2912028e2a39a069fe858/conf/model/mlp.yaml#L1

https://github.com/lucmos/hydra_example/blob/243b2bf695ac1adae5a2912028e2a39a069fe858/conf/model/resnet.yaml#L1

And some default in the main config:

https://github.com/lucmos/hydra_example/blob/243b2bf695ac1adae5a2912028e2a39a069fe858/conf/default.yaml#L1-L4

In other parts of the configuration you can dynamically resolve any variable (e.g. the name in this case), note how the path of the variable contains the folder name not the file name:

https://github.com/lucmos/hydra_example/blob/243b2bf695ac1adae5a2912028e2a39a069fe858/conf/other/default.yaml#L1

Once you have this structure set up you can change the defaults from the main conf/default.yaml (e.g. from mlp to resnet), or directly from the command line:

❯ python main.py model=mlp
data:
  dataset:
    name: mnist
model:
  name: my_awesome_mlp
other:
  an_interpolated_var: my_awesome_mlp-mnist

❯ python main.py model=resnet
data:
  dataset:
    name: mnist
model:
  name: my_awesome_resnet
other:
  an_interpolated_var: my_awesome_resnet-mnist

Now if in future you want to configure yet another model, you can just add a new file under model, e.g. model/my_new_model.yaml, without modifying in any way the other models configs.


You can also modify directly the various defaults.yaml (it may be faster while developing stuff) or create other folders for new hierarchies (at the same level, or deeper)


I hope I replied to your question, let me know if you have any other doubts!

lucmos commented 3 years ago

For example, i'd like to place model- or dataset- specific configurations to a separate file, like in conf/data/dataset_conf.yaml, but i'm surely doing something wrong because when i try to interpolate ${data.dataset_conf.some_key}

I think you should try with ${data.some_key} setting the default data: dataset_conf or as a parameter when you launch the script data=dataset_conf