facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.74k stars 628 forks source link

[Bug] Hydra resolver does not work in compose_api #2017

Open dolfinus opened 2 years ago

dolfinus commented 2 years ago

🐛 Bug

Description

hydra:key substitution does not work in interactive Python session and Jupyter notebook. hydra.key does, but it is not recommended (according to https://github.com/facebookresearch/hydra/issues/1786#issuecomment-914734857).

Checklist

To reproduce

Minimal Code/Config snippet to reproduce

conf/config.yaml

defaults:
  - env: dev

conf/env/prod.yaml

spark:
    appName: ${hydra:job.name}

conf/env/dev.yaml

spark:
    appName: ${hydra.job.name}

run.py

from hydra import initialize, compose

with initialize(config_path="conf"):
    conf = compose(config_name="config.yaml", return_hydra_config=True)
    print(conf.env.spark.appName)
    # prints current file name, e.g. "run"

with initialize(config_path="conf"):
    conf = compose(config_name="config.yaml", return_hydra_config=True, overrides=["env=prod"])
    print(conf.env.spark.appName)
    # raises exception

Stack trace/error message

$ python run.py
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 357, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 196, in _format_and_raise
    type_override=type_override,
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/_utils.py", line 821, in format_and_raise
    _raise(ex, cause)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__
    return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 446, in _get_impl
    key=key, value=node, default_value=default_value
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 69, in _resolve_with_default
    throw_on_resolution_failure=True,
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 622, in _maybe_resolve_interpolation
    memo=memo if memo is not None else set(),
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 483, in _resolve_interpolation_from_parse_tree
    parse_tree=parse_tree, node=value, key=key, parent=parent, memo=memo
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 669, in resolve_parse_tree
    ).with_traceback(sys.exc_info()[2])
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 662, in resolve_parse_tree
    return visitor.visit(parse_tree)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar/gen/OmegaConfGrammarParser.py", line 205, in accept
    return visitor.visitConfigValue(self)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar_visitor.py", line 101, in visitConfigValue
    return self.visit(ctx.getChild(0))
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar/gen/OmegaConfGrammarParser.py", line 339, in accept
    return visitor.visitText(self)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar_visitor.py", line 298, in visitText
    return self.visitInterpolation(c)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar_visitor.py", line 125, in visitInterpolation
    return self.visit(ctx.getChild(0))
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar/gen/OmegaConfGrammarParser.py", line 1030, in accept
    return visitor.visitInterpolationResolver(self)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/grammar_visitor.py", line 182, in visitInterpolationResolver
    args_str=tuple(args_str),
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 653, in resolver_interpolation_callback
    inter_args_str=args_str,
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/base.py", line 596, in _evaluate_custom_resolver
    inter_args_str,
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 435, in resolver_wrapper
    ret = resolver(*args, **kwargs)
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/hydra/core/utils.py", line 215, in <lambda>
    lambda path: OmegaConf.select(cast(DictConfig, HydraConfig.get()), path),
  File "/home/maxim/Repo/ds-demo-project/venv/lib/python3.7/site-packages/hydra/core/hydra_config.py", line 31, in get
    raise ValueError("HydraConfig was not set")
omegaconf.errors.InterpolationResolutionError: ValueError raised while resolving interpolation: HydraConfig was not set
    full_key: env.spark.appName
    object_type=dict

Expected Behavior

Substitutionshydra: are not causing an exception inside a context created by compose function with return_hydra_config=True. HydraConfig instance should be initialized.

System information

Additional context

Same behavior has been observed several times:

716

https://github.com/facebookresearch/hydra/issues/1786#issuecomment-913110496

Jasha10 commented 2 years ago

Thanks for the report, @dolfinus. I can reproduce the behavior from the example you provided.

The issue here is that the hydra: resolver (${hydra:...}) depends on global state, namely the state of HydraConfig. When using @hydra.main, using the hydra: resolver results in a query to this global state.

Meanwhile, the compose API is a functional API; after the call to compose has returned, the global HydraConfig state is not retained. This is why we are getting the "HydraConfig was not set" ValueError in the traceback above.

I'm not sure what the best way to get around this is... Is your usecase limited to looking up hydra.job.name? Or is there some other hydra setting that you wanted to query?

dolfinus commented 2 years ago

Is your usecase limited to looking up hydra.job.name?

No, it's actually any substitution supported by Hydra, e.g. hydra:runtime.cwd, hydra:job.name, etc.

I'm using Hydra to pass configuration to ETL pipelines for different environments. Firsly, developer is writing ETL pipeline in Jupyter notebook or IDE using config with default env=dev. Secondly, CI is testing this pipeline with overriding env=test. Thirdly, CD is pushing ETL pipeline to staging server allowing user to test it on some real data, env=staging. And only after that pipeline is deployed to production environment with overriding env=prod.

The main goal here is to use just the same source code (maybe with copy-pasting this code from Jupyter/IDE to script file) with minimal changes. So the plan is to have function in Jupyter:

from hydra import initialize, compose

def my_func(conf, source, target):
   ...

with initialize(config_path="conf"):
    conf = compose(config_name="config.yaml", return_hydra_config=True)

    source = Source(...)
    target = Target(...)
    runner = Runner(...)

    runner.execute(my_func, conf, source, target)

and then commit it to main.py:


import hydra

def my_func(conf, source, target):
   ...

@hydra.main(config_path="conf")
def main(conf):    
    source = Source(...)
    target = Target(...)
    runner = Runner(...)

    runner.execute(my_func, conf, source, target)

Using global config is not the best idea, because there could be different configurations used by tests which may conflict.

Jasha10 commented 2 years ago

I see. Thanks for the information. This is an edge case where the compose API falls short of the @hydra.main API. Brainstorming APIs that could bridge this gap, I think the easiest thing to implement would be allowing @hydra.main to accept a return_hydra_config=True parameter (like the compose API does).

If @hydra.main accepted a return_hydra_config=True parameter, would this solve your problem @dolfinus?

dolfinus commented 2 years ago

implement would be allowing @hydra.main to accept a return_hydra_config=True parameter

Do you mean hydra.initialize instead of hydra.main?

Jasha10 commented 2 years ago

No, I was talking about hydra.main. Please disregard my previous comment.

The root of the problem is that the hydra: resolver relies on global state (i.e. the state of the HydraConfig singleton). When using the compose API, the hydra: resolver fails because the global state (HydraConfig) is not set up.

It would be nice to have an API that has the best of both worlds from @hydra.main and compose:

Again brainstorming about possible APIs, it would be nice if compose could function as a context manager:

# a possible API:
with initialize(...):
    with compose(..., context=True) as conf:
        # ${hydra:} resolver works and HydraConf is available in this context
    # ${hydra:} resolver and HydraConf no longer available

Thanks for the link to #716. The context is useful. For now I will put this issue on the wishlist.

Jasha10 commented 2 years ago

To reproduce

... conf/env/prod.yaml

spark:
    appName: ${hydra:job.name}

conf/env/dev.yaml

spark:
    appName: ${hydra.job.name}

As you probably know, using ${hydra.job.name} works when using the compose API, but it is not recommended when using @hydra.main. Omry has previously endorsed this approach here. This is one of the inconsistencies between @hydra.main and compose.

jzazo commented 2 years ago

Regarding the suggested API two messages above:

# a possible API:
with initialize(...):
    with compose(..., context=True) as conf:
        # ${hydra:} resolver works and HydraConf is available in this context
    # ${hydra:} resolver and HydraConf no longer available

If the resolver does not work outside the compose context, how can we access that attribute anywhere else? Can it be resolved inplace and substituted?

Another idea I think would make the hydra.main and the compose functionality more similar is to add an option in the initialize or compose methods to keep HydraConfig initialized. That way we can always have the resolve functionality, and there is no need to set return_hydra_config=True, as it would be accessible anyways (including ${hydra:choices.GROUP} resolutions). Would this be possible?

Jasha10 commented 2 years ago

If the resolver does not work outside the compose context, how can we access that attribute anywhere else? Can it be resolved inplace and substituted?

Yes, you could call OmegaConf.resolve while inside the compose context, which would enable you to safely use it after exiting from that context. Alternatively, you could ensure that the application logic using conf is contained within the context of the the call to compose. This is similar to how, with @hydra.main, the application logic that makes use of the config is contained within the user's decorated function.

Another idea ... to keep HydraConfig initialized.

This feels like an antipattern to me because this would increase the amount of persistent global state that results from running a Hydra application. Using @hydra.main also does not result in HydraConfig being persistent; HydraConfig is only available within the context of the function that is decorated by @hydra.main.

jzazo commented 2 years ago

Regarding the persistence, if initialized out of a with context, it would be persistent, I imagine. But your proposal seems the better approach, thanks for clarifying. Your proposed API looks good to me and my application could work in a contained context.

A bit unrelated, isn't the configstore's singleton an anti-pattern that persists globally, though?

Jasha10 commented 2 years ago

Yes indeed, the ConfigStore is persistent global state.

For Hydra's test suite, we have a hydra_restore_singletons fixture that is used to save and then later restore the global state so that the tests can be run independently.

odelalleau commented 2 years ago

Just a note that I also ran into this problem. FYI my context:

I would expect HydraConfig to be available within my with initialize(...) context (possibly with not all fields that are found when using @hydra.main(...), but at least if we have the config we can manipulate it for the purpose of the test)

Edit: my current workaround is the following (sketch):

with initialize(....):
    cfg = compose(
        ...,
        return_hydra_config=True,
    )
    HydraConfig.instance().set_config(cfg)
odelalleau commented 2 years ago

Quick follow-up on my previous comment, now that I better understand what's going on:

I think it'd be possible to provide a "compose" context manager that could be used like this:

with initialize(...):
    with compose_context(config_name=..., overrides=..., return_hydra_config=...) as cfg:
        # At this point `cfg` is the same config you would have obtained by calling `cfg = compose(...)`
        # with the only difference being that `HydraConfig.instance().cfg` is set (and is reset to its
        # original value when exiting this context).
        ...

It's actually pretty easy to implement this context outside of Hydra, but providing it would make it easier for users.

It'd be also possible to combine everything in a single context by having initialize() take for instance optional compose_args arguments and returning the config, but I'm not sure it's necessarily better (would probably lose a bit of flexibility).

Jasha10 commented 2 years ago

Thanks for looking into this @odelalleau! I'll try to put together an implementation for compose_context.

I'm thinking that a usage pattern involving OmegaConf.resolve could even make the cfg object behave nicely after the context has ended:

with initialize(...):
    with compose_context(...) as cfg:
        OmegaConf.resolve(cfg)  # makes cfg usable outside of the context
...
application_logic(cfg.foo)
jbergq commented 1 year ago

Any updates on this?

omry commented 1 year ago

@Jasha10, Utilizing a context is a nice idea, but I think you should not abuse compose(). How about adding a new context for it?

with compose_with_global_hydra_config(...):
   ...

Alternatively, you can just document in the Compose API page how to regain access to the Hydra config (like the example from @odelalleau).

maxoppelt commented 1 year ago

Any update on this?

jasonkhadka commented 1 year ago

Any progress here?

Thanks for looking into this @odelalleau! I'll try to put together an implementation for compose_context.

I'm thinking that a usage pattern involving OmegaConf.resolve could even make the cfg object behave nicely after the context has ended:

with initialize(...):
    with compose_context(...) as cfg:
        OmegaConf.resolve(cfg)  # makes cfg usable outside of the context
...
application_logic(cfg.foo)
adaruna3 commented 1 year ago

Any updates?

omry commented 1 year ago

There is at least one actual workaround in the comments.