facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.66k stars 623 forks source link

[Feature Request] Appending to a structured dict from a config file #2471

Open iamhatesz opened 1 year ago

iamhatesz commented 1 year ago

🚀 Feature Request

I have a graylog entry in my config like:

@hydrated_dataclass(target=dict)
class GraylogConfigEvaluationExtrasConf:
    run_id: str = "${run.id}"
    evaluation_url: str = "${getattr:${resolved_spec},url}"

GraylogConfigConf = builds(
    GraylogConfig,
    host="${oc.env:GRAYLOG_HOST}",
    port="${oc.env:GRAYLOG_PORT}",
    extras=GraylogConfigEvaluationExtrasConf,
)

Now, I want to add a key to the graylog.extras dict in my config file:

graylog:
  extras:
    context: evaluation

This fails because of:

E       Key 'context' not in 'GraylogConfigEvaluationExtrasConf'
E           full_key: graylog.extras.context
E           object_type=GraylogConfigEvaluationExtrasConf

But it works using CLI overrides:

... +graylog.extras.context=evaluation

Motivation

Is your feature request related to a problem? Please describe. Maybe I am simply missing an existing functionality?

Pitch

Describe the solution you'd like A mechanism to append to a dict inside a config file.

Describe alternatives you've considered I am using CLI overrides as a workaround, but it breaks my rule of keeping all the necessary config for a certain job in its config file.

Are you willing to open a pull request? (See CONTRIBUTING) Sure.

Additional context

N/A

Jasha10 commented 1 year ago

Hi @iamhatesz,

Structured configs (such as GraylogConfigEvaluationExtrasConf) are supposed to provide some degree of type safety by enforcing a schema. The schema is determined by the fields of the backing dataclass / attr class. You're seeing that error message because there is no context field in the GraylogConfigEvaluationExtrasConf dataclass.

Two ways to resolve the issue are (1) add a context field to GraylogConfigEvaluationExtrasConf, or (2) replace the dataclass GraylogConfigEvaluationExtrasConf with an untyped dictionary.

Here's approach 1:

@hydrated_dataclass(target=dict)
class GraylogConfigEvaluationExtrasConf:
    context: Any  # added context field
    run_id: str = "${run.id}"
    evaluation_url: str = "${getattr:${resolved_spec},url}"

GraylogConfigConf = builds(
    GraylogConfig,
    host="${oc.env:GRAYLOG_HOST}",
    port="${oc.env:GRAYLOG_PORT}",
    extras=GraylogConfigEvaluationExtrasConf,
)

Here's approach 2:

graylog_config_evaluation_extras_conf = {  # using untyped dict
    "run_id": "${run.id}",
    "evaluation_url": "${getattr:${resolved_spec},url}",
}

GraylogConfigConf = builds(
    GraylogConfig,
    host="${oc.env:GRAYLOG_HOST}",
    port="${oc.env:GRAYLOG_PORT}",
    extras=graylog_config_evaluation_extras_conf,
)
iamhatesz commented 1 year ago

@Jasha10 thanks for the tip. I was hoping it is possible to achieve both: enforce structure (run_id and evaluation_url must be present), but allow for extra fields (like context) at runtime.

As I understand, I need to use untyped dict for that. However, isn't it a bug that it is possible to override this with CLI, but not in YAML file?

Jasha10 commented 1 year ago

@> However, isn't it a bug that it is possible to override this with CLI, but not in YAML file?

Writing greylog: {extras: {context: evaluation}} in a yaml file, and then merging that yaml file, is roughly equivalent to writing graylog.extras.context=evaluation at the command line (without a plus symbol). The plus symbol in +graylog.extras.context=evaluation means "I know what I'm doing, make the override happen even if it violates the structured config schema". If you want the schema to be enforced, you shouldn't use the plus symbol.


As I understand, I need to use untyped dict for that.

I have a very ugly workaround that uses the _args_ keyword supported by instantiate to pass a list-of-lists to builtins.dict.

Workaround details: ```python from dataclasses import field from typing import Any, List, Tuple from hydra.utils import instantiate from hydra_zen import builds, hydrated_dataclass from omegaconf import OmegaConf class GraylogConfig: def __init__(self, host: str, port: str, extras: Any) -> None: print(f"In GraylogConfig.__init__: got {extras=}") @hydrated_dataclass(target=dict) class GraylogConfigEvaluationExtrasConf: _args_: List[List[Any]] = field( default_factory=list ) # pass a list of args to `builtins.dict` run_id: str = "abc" evaluation_url: str = "my.url" GraylogConfigConf = builds( GraylogConfig, host="123.45.67.890", port="9999", extras=GraylogConfigEvaluationExtrasConf, ) print("HERE IS GraylogConfigConf:") print(OmegaConf.to_yaml(GraylogConfigConf)) print() other_settings = OmegaConf.create( """ graylog: extras: evaluation_url: a_diferent.url _args_: - - [context, evaluation] - [another_key, another_value] """ ) cfg = OmegaConf.merge({"graylog": GraylogConfigConf}, other_settings) print("HERE IS THE MERGED CONFIG:") print(OmegaConf.to_yaml(cfg)) instantiate(cfg) ``` ```yaml $ python tmp.py HERE IS GraylogConfigConf: _target_: __main__.GraylogConfig host: 123.45.67.890 port: '9999' extras: _args_: [] run_id: abc evaluation_url: my.url _target_: builtins.dict HERE IS THE MERGED CONFIG: graylog: _target_: __main__.GraylogConfig host: 123.45.67.890 port: '9999' extras: _args_: - - - context - evaluation - - another_key - another_value run_id: abc evaluation_url: a_diferent.url _target_: builtins.dict In GraylogConfig.__init__: got extras={'context': 'evaluation', 'another_key': 'another_value', 'run_id': 'abc', 'evaluation_url': 'a_diferent.url'} ``` In the above example, calling `instantiate` on `cfg.extras` is equivalent to the following call to `builtins.dict`: ```python >>> dict([["context", "evaluation"], ["another_key", "another_value"]], run_id="abc", evaluation_url="a_different.url") {'context': 'evaluation', 'another_key': 'another_value', 'run_id': 'abc', 'evaluation_url': 'a_different.url'} ```

There's a deprecated (and currently undocumented) feature that may allow you to get typed behavior for run_id/evaluation_url and untyped behavior for other keys: inheriting from typing.Dict.

Here are some details about this (deprecated) feature: ```python @dataclass class GraylogConfigEvaluationExtrasConf(typing.Dict[Any, Any]): __target__: str = "builtins.dict" run_id: str = "${run.id}" evaluation_url: str = "${getattr:${resolved_spec},url}" ) ``` ^ The above means the `__target__`/`run_id`/`evaluation_url` keys will be required (and will be typed as `str`), and other keys (with key type Any and value type Any) can be added using dictionary syntax: ```python instance = GraylogConfigEvaluationExtrasConf() instance["extra-key"] = "another-value" ``` You can use e.g. `typing.Dict[str, int]` to require that additional values have a specific key/value type. This feature was deprecated due to complexity and maintenance overhead. There was some documentation of it in [the OmegaConf v2.0 structured config docs](https://omegaconf.readthedocs.io/en/2.0_branch/structured_config.html).