🚀 Feature Request

It would be ideal to reuse the constructor parameters to automatically derive a skeleton of the configuration of a given module.

Motivation

I'm following this template to set up a research project. It's based on Pytorch-Lightning and Hydra. It combines several modules each of which has a dedicated configuration file. In particular, the PyTorch-Lightning Trainer has a dedicated configuration file. The issue is that when running the main application, it seems to me that the only way for me to specify additional parameters for the trainer is to use the + notation. For instance, to specify the parameter limit_train_batches of the Trainer constructor, I have to do:

python run.py +trainer.limit_train_batches=1

This is because the default configuration file for the Trainer has only a fixed set of parameters. However, when using the _target_ notation, I believe it would be beneficial to resolve them automatically so people don't have to copy over all the attributes of a given class constructor.

Pitch

Describe the solution you'd like When using _target_ instantiation, the user can write a skeleton configuration that specifies some very common (non-default) parameters. All the others are directly derived from the constructor automatically. Again, imagine the Trainer case, I would like to automatically allow users to change any parameter of the Trainer class without having to remember when to use '+' or not.

Describe alternatives you've considered One hacky solution seems to be to just use '+' notation whenever you get an error. This is error-prone and prevents the user from spotting unused parameters because, from my understanding, the parameters are just appended without being actually validated. In this sense, the constructor acts as a template for the default class.

The alternative here seems to be to just copy-paste all the parameters of the Trainer constructor in a default configuration file and then let the user change their values using the usual Hydra CLI notation.

Hi @aleSuglia,

Thanks for sharing this idea. Syncing Hydra configs with target class args is a known pain-point for Hydra users.

First let me mention a few currently-working approaches to smoothing the rough edges

On the yaml side, it is possible to add the missing class arguments (with their default values). This is labor intensive.
On the python side, it is possible to create structured config objects representing the full interface of the object being instantiated (again using default values for the kwarg fields). This is also labor intensive.
R.e. creating structured configs, there are a few libraries that attempt to ease the creation of structured configs from arbitrary class interfaces. Notable examples include:
- configen: provides a script to generate python code that defines structured config dataclasses
- hydra-torch: provides plug-and-play structured configs for many of pytorch's APIs
- hydra-zen: creates structured configs at runtime

Here is a bit of context surrounding the `_target_` keyword

Currently Hydra's processes of config composition (i.e. creating the cfg object that is passed to your app) and config instantiation (e.g. calling hydra.utils.instantiate on a config object) are independent, and they happen one-after-another.

During the composition phase, the string _target_ has no special meaning
During the instantiation phase, _target_ is treated as a keyword whose value is used to look up a python callable.

You're proposal is to give _target_ some special meaning during the composition phase too, so that configs having a _target_ key will behave specially with respect to the arguments of the targeted callable.

My gut feeling

There are a few drawbacks to the proposed change. The most significant one is that the proposed behavior would require hydra to import the target during the config composition process. For example, if your yaml file contains the target pytorch_lightning.Trainer, then Hydra would need to import the pytorch_lightning module in order to see what arguments appear in the Trainer class. Currently, this importing happens when you call instantiate, but not during config composition. Downsides of importing the targeted module/callable sooner include:

importing all targeted modules will probably result in slower config composition
importing the module may have side effects. Currently calling instantiate lets users control when the side-effects happen (and which targets get imported).
if the imported module has e.g. a syntax error or some other exception, the Hydra config composition process would crash.

My gut feeling is that these drawbacks are serious, and that the proposed enhancement does not solve issues related to syncing yaml configs with python code in the most general way possible (though I'm not sure how to solve it generally).

facebookresearch / hydra

[Feature Request] Derive structured configuration from class constructor #1901

🚀 Feature Request

Motivation

Pitch

First let me mention a few currently-working approaches to smoothing the rough edges

Here is a bit of context surrounding the `_target_` keyword

My gut feeling

facebookresearch / hydra

[Feature Request] Derive structured configuration from class constructor #1901

🚀 Feature Request

Motivation

Pitch

First let me mention a few currently-working approaches to smoothing the rough edges

Here is a bit of context surrounding the _target_ keyword

My gut feeling

Here is a bit of context surrounding the `_target_` keyword