Open dolfinus opened 2 years ago
Thanks for the report, @dolfinus. I can reproduce the behavior from the example you provided.
The issue here is that the hydra:
resolver (${hydra:...}
) depends on global state, namely the state of HydraConfig.
When using @hydra.main
, using the hydra:
resolver results in a query to this global state.
Meanwhile, the compose API is a functional API; after the call to compose
has returned, the global HydraConfig state is not retained. This is why we are getting the "HydraConfig was not set" ValueError in the traceback above.
I'm not sure what the best way to get around this is...
Is your usecase limited to looking up hydra.job.name
? Or is there some other hydra setting that you wanted to query?
Is your usecase limited to looking up hydra.job.name?
No, it's actually any substitution supported by Hydra, e.g. hydra:runtime.cwd
, hydra:job.name
, etc.
I'm using Hydra to pass configuration to ETL pipelines for different environments.
Firsly, developer is writing ETL pipeline in Jupyter notebook or IDE using config with default env=dev
.
Secondly, CI is testing this pipeline with overriding env=test
.
Thirdly, CD is pushing ETL pipeline to staging server allowing user to test it on some real data, env=staging
.
And only after that pipeline is deployed to production environment with overriding env=prod
.
The main goal here is to use just the same source code (maybe with copy-pasting this code from Jupyter/IDE to script file) with minimal changes. So the plan is to have function in Jupyter:
from hydra import initialize, compose
def my_func(conf, source, target):
...
with initialize(config_path="conf"):
conf = compose(config_name="config.yaml", return_hydra_config=True)
source = Source(...)
target = Target(...)
runner = Runner(...)
runner.execute(my_func, conf, source, target)
and then commit it to main.py
:
import hydra
def my_func(conf, source, target):
...
@hydra.main(config_path="conf")
def main(conf):
source = Source(...)
target = Target(...)
runner = Runner(...)
runner.execute(my_func, conf, source, target)
Using global config is not the best idea, because there could be different configurations used by tests which may conflict.
I see. Thanks for the information.
This is an edge case where the compose
API falls short of the @hydra.main
API.
Brainstorming APIs that could bridge this gap, I think the easiest thing to implement would be allowing @hydra.main
to accept a return_hydra_config=True
parameter (like the compose API does).
If @hydra.main
accepted a return_hydra_config=True
parameter, would this solve your problem @dolfinus?
implement would be allowing @hydra.main to accept a return_hydra_config=True parameter
Do you mean hydra.initialize
instead of hydra.main
?
No, I was talking about hydra.main
. Please disregard my previous comment.
The root of the problem is that the hydra:
resolver relies on global state (i.e. the state of the HydraConfig
singleton). When using the compose
API, the hydra:
resolver fails because the global state (HydraConfig
) is not set up.
It would be nice to have an API that has the best of both worlds from @hydra.main
and compose
:
@hydra.main
allows for use of the the hydra:
resolver and access to HydraConfig
(within the context of the function decorated by @hydra.main
). However, hydra.main
does not easily allow control over the overrides via python code.compose
allows passing overrides
and its use is consistent with the functional programming paradigm, but access to HydraConfig
is not available.Again brainstorming about possible APIs, it would be nice if compose
could function as a context manager:
# a possible API:
with initialize(...):
with compose(..., context=True) as conf:
# ${hydra:} resolver works and HydraConf is available in this context
# ${hydra:} resolver and HydraConf no longer available
Thanks for the link to #716. The context is useful. For now I will put this issue on the wishlist.
To reproduce
...
conf/env/prod.yaml
spark: appName: ${hydra:job.name}
conf/env/dev.yaml
spark: appName: ${hydra.job.name}
As you probably know, using ${hydra.job.name}
works when using the compose
API, but it is not recommended when using @hydra.main
. Omry has previously endorsed this approach here.
This is one of the inconsistencies between @hydra.main
and compose
.
Regarding the suggested API two messages above:
# a possible API:
with initialize(...):
with compose(..., context=True) as conf:
# ${hydra:} resolver works and HydraConf is available in this context
# ${hydra:} resolver and HydraConf no longer available
If the resolver does not work outside the compose context, how can we access that attribute anywhere else? Can it be resolved inplace and substituted?
Another idea I think would make the hydra.main and the compose functionality more similar is to add an option in the initialize or compose methods to keep HydraConfig
initialized. That way we can always have the resolve functionality, and there is no need to set return_hydra_config=True
, as it would be accessible anyways (including ${hydra:choices.GROUP}
resolutions). Would this be possible?
If the resolver does not work outside the compose context, how can we access that attribute anywhere else? Can it be resolved inplace and substituted?
Yes, you could call OmegaConf.resolve
while inside the compose context, which would enable you to safely use it after exiting from that context.
Alternatively, you could ensure that the application logic using conf
is contained within the context of the the call to compose
. This is similar to how, with @hydra.main
, the application logic that makes use of the config is contained within the user's decorated function.
Another idea ... to keep HydraConfig initialized.
This feels like an antipattern to me because this would increase the amount of persistent global state that results from running a Hydra application. Using @hydra.main
also does not result in HydraConfig
being persistent; HydraConfig
is only available within the context of the function that is decorated by @hydra.main
.
Regarding the persistence, if initialized out of a with
context, it would be persistent, I imagine. But your proposal seems the better approach, thanks for clarifying. Your proposed API looks good to me and my application could work in a contained context.
A bit unrelated, isn't the configstore's singleton an anti-pattern that persists globally, though?
Yes indeed, the ConfigStore is persistent global state.
For Hydra's test suite, we have a hydra_restore_singletons
fixture that is used to save and then later restore the global state so that the tests can be run independently.
Just a note that I also ran into this problem. FYI my context:
${hydra:runtime.choices.xyz}
to access which config group was used (and this is failing with HydraConfig was not set
)I would expect HydraConfig
to be available within my with initialize(...)
context (possibly with not all fields that are found when using @hydra.main(...)
, but at least if we have the config we can manipulate it for the purpose of the test)
Edit: my current workaround is the following (sketch):
with initialize(....):
cfg = compose(
...,
return_hydra_config=True,
)
HydraConfig.instance().set_config(cfg)
Quick follow-up on my previous comment, now that I better understand what's going on:
I think it'd be possible to provide a "compose" context manager that could be used like this:
with initialize(...):
with compose_context(config_name=..., overrides=..., return_hydra_config=...) as cfg:
# At this point `cfg` is the same config you would have obtained by calling `cfg = compose(...)`
# with the only difference being that `HydraConfig.instance().cfg` is set (and is reset to its
# original value when exiting this context).
...
It's actually pretty easy to implement this context outside of Hydra, but providing it would make it easier for users.
It'd be also possible to combine everything in a single context by having initialize()
take for instance optional compose_args
arguments and returning the config, but I'm not sure it's necessarily better (would probably lose a bit of flexibility).
Thanks for looking into this @odelalleau!
I'll try to put together an implementation for compose_context
.
I'm thinking that a usage pattern involving OmegaConf.resolve
could even make the cfg
object behave nicely after the context has ended:
with initialize(...):
with compose_context(...) as cfg:
OmegaConf.resolve(cfg) # makes cfg usable outside of the context
...
application_logic(cfg.foo)
Any updates on this?
@Jasha10, Utilizing a context is a nice idea, but I think you should not abuse compose(). How about adding a new context for it?
with compose_with_global_hydra_config(...):
...
Alternatively, you can just document in the Compose API page how to regain access to the Hydra config (like the example from @odelalleau).
Any update on this?
Any progress here?
Thanks for looking into this @odelalleau! I'll try to put together an implementation for
compose_context
.I'm thinking that a usage pattern involving
OmegaConf.resolve
could even make thecfg
object behave nicely after the context has ended:with initialize(...): with compose_context(...) as cfg: OmegaConf.resolve(cfg) # makes cfg usable outside of the context ... application_logic(cfg.foo)
Any updates?
There is at least one actual workaround in the comments.
🐛 Bug
Description
hydra:key
substitution does not work in interactive Python session and Jupyter notebook.hydra.key
does, but it is not recommended (according to https://github.com/facebookresearch/hydra/issues/1786#issuecomment-914734857).Checklist
To reproduce
Minimal Code/Config snippet to reproduce
conf/config.yaml
conf/env/prod.yaml
conf/env/dev.yaml
run.py
Stack trace/error message
Expected Behavior
Substitutions
hydra:
are not causing an exception inside a context created bycompose
function withreturn_hydra_config=True
.HydraConfig
instance should be initialized.System information
1.1.0
from pypi and2a9f181
commit from main branchAdditional context
Same behavior has been observed several times:
716
https://github.com/facebookresearch/hydra/issues/1786#issuecomment-913110496