Closed dc3ea9f closed 1 year ago
Thank you for opening this.
This is indeed a very important feature and was on the TODO list since last year.
I believe this was postponed because we need to wait pyyaml
for anchor
and alias
implementation.
I'll double check their progress this week.
Aha! This is a feature of pyyaml
rather than omegaconf
, through omegaconf
implemented it.
Thank you for your information.
You can close this issue since it should be solved by pyyaml
.
I double checked the progress.
Now, pyyaml
has full support with alias
and anchor
So, the following config is working:
test.yaml:
a: &a 1
b: *a
>>> import chanfig
>>> chanfig.load('test.yaml')
NestedDict(
('a'): 1
('b'): 1
)
However, the ${}
is not described in yaml
standard and so will not be supported by pyyaml
.
Please let me know if anchor
and alias
facility your needs.
Thank you for your efforts.
While I think the anchor
and alias
are weaker than the value interpolation. I think the implementation of value interpolation is essential, it can help us build more flexible configurations.
No worries, they are indeed very different. anchor
and alias
are great for longer duplications, while ${}
is great for Variables.
In the next release, we will add a format
function to implement this behavior.
Nice job! Looking forward to your updates.
================
Mohan Zhou Ph.D. Student, Harbin Institute of Technology https://www.mhzhou.com/
From: Zhiyuan Chen @.> Sent: Friday, July 21, 2023 4:34:09 PM To: ZhiyuanChen/CHANfiG @.> Cc: Mohan Zhou @.>; Author @.> Subject: Re: [ZhiyuanChen/CHANfiG] Feature Request: variable interpolation in yaml file (Issue #14)
No worries, they are indeed very different. anchor and alias are great for longer duplications, while ${} is great for Variables. In the next release, we will add a format function to implement this behavior.
— Reply to this email directly, view it on GitHubhttps://github.com/ZhiyuanChen/CHANfiG/issues/14#issuecomment-1645215984, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHWHJ7VUUZ7MERVMCYRULBLXRI5IDANCNFSM6AAAAAA2HMX52U. You are receiving this because you authored the thread.Message ID: @.***>
Variable interpolation is available at the master branch. Please let me know if you like it.
If everything works fine, we will be releasing v0.0.88.
Thank you for your efforts.
However, I found the interpolation only takes effect when the value is ${}
, as this line does. It is just like anchor/alias in yaml, and cannot handle things like "http://${server.host}:${server.port}".
You can test with this script:
from chanfig import Config
from rich import print
import io
yaml = """server:
host: localhost
port: 80
client:
url: http://${server.host}:${server.port}/
server_port: ${server.port}
description: Client of ${.url}
"""
if __name__ == '__main__':
yaml = io.StringIO(yaml)
config = Config.from_yaml(yaml)
print(config)
print(config.interpolate())
The result is:
Config(<class 'chanfig.config.Config'>,
('server'): Config(<class 'chanfig.config.Config'>,
('host'): 'localhost'
('port'): 80
)
('client'): Config(<class 'chanfig.config.Config'>,
('url'): 'http://${server.host}:${server.port}/'
('server_port'): '${server.port}'
('description'): 'Client of ${.url}'
)
)
Config(<class 'chanfig.config.Config'>,
('server'): Config(<class 'chanfig.config.Config'>,
('host'): 'localhost'
('port'): 80
)
('client'): Config(<class 'chanfig.config.Config'>,
('url'): 'http://${server.host}:${server.port}/' <--- ERROR, expected 'http://localhost:80/'
('server_port'): 80 <--- Fine
('description'): 'Client of ${.url}' <--- ERROR, expected 'Client of http://localhost:80/'
)
)
We can find only client.server_port
is interpolated, and client.url
and client.description
are not as expected.
Given the example before, we may need to find all value that contains ${}
, rather than equal to ${}
. In your implementation, the variable interpolation's functionality is strongly limited (why I don't just use anchor
and alias
?). Given that values need to be interpolated, a parse tree or recursively implementation may be involved to resolve the true value. When implementing this, please pay attention to allowing escape characters, e.g. \${}
won't be interpolated.
Besides, I am also thinking about how to maximize the performance of variance interpolation. A possible solution is we can predefine some interpolates
and the user can select which to use. And we need to expose the interpolate
function to Config.from_yaml
.
E.g.:
from chanfig import Config
from chanfig.interpolators import (
DefaultInterpolator,
EnvInterpolator,
LambdaInterpolator,
LambdaRegexInterpolator,
)
config = Config.from_yaml(yaml) # would use DefaultInterpolator
config = Config.from_yaml(
yaml, interpolate=[DefaultInterpolator, EnvInterpolator]
) # would handle default and ${Env.xxx} with environment variables
config = Config.from_yaml(
yaml,
interpolate=[
DefaultInterpolator,
EnvInterpolator,
LambdaInterpolator(cond, mapping),
],
) # would handle default, env and user customed interpolator
By the way, I think when people use the ${}
in yaml, they mostly expect variable interpolation rather than the raw value. So the interpolate can be set to default behavior.
Another thing, I can only install the repo with pip install git+https://github.com/ZhiyuanChen/CHANfiG
. When I tried git clone https://github.com/ZhiyuanChen/CHANfiG && cd CHANfiG
, and then python setup.py develop
, it fails with ModuleNotFoundError: No module named 'chanfig._version'
. I am not familiar with package distribution, can you fix this?
One more thing, when use Config.from_yaml()
to load a non-existent file, it would raise the error:
Traceback (most recent call last):
File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/site-packages/chanfig/flat_dict.py", line 1174, in open
file = open(file, *args, encoding=encoding, **kwargs) # type: ignore # noqa: SIM115
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'config.yaml'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xxx/Documents/tmp/test.py", line 7, in <module>
config = Config.from_yaml('config.yaml')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/site-packages/chanfig/flat_dict.py", line 1090, in from_yaml
with cls.open(file) as fp: # pylint: disable=C0103
File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/site-packages/chanfig/flat_dict.py", line 1177, in open
file.close() # type: ignore
^^^^^^^^^^
AttributeError: 'str' object has no attribute 'close'
The second exception is not expected, we can modify here with:
try:
file = open(file, *args, encoding=encoding, **kwargs) # type: ignore # noqa: SIM115
yield file # type: ignore
except:
raise
else:
file.close() # type: ignore
Thank you for your efforts.
However, I found the interpolation only takes effect when the value is
${}
, as this line does. It is just like anchor/alias in yaml, and cannot handle things like "http://${server.host}:${server.port}".You can test with this script:
from chanfig import Config from rich import print import io yaml = """server: host: localhost port: 80 client: url: http://${server.host}:${server.port}/ server_port: ${server.port} description: Client of ${.url} """ if __name__ == '__main__': yaml = io.StringIO(yaml) config = Config.from_yaml(yaml) print(config) print(config.interpolate())
The result is:
Config(<class 'chanfig.config.Config'>, ('server'): Config(<class 'chanfig.config.Config'>, ('host'): 'localhost' ('port'): 80 ) ('client'): Config(<class 'chanfig.config.Config'>, ('url'): 'http://${server.host}:${server.port}/' ('server_port'): '${server.port}' ('description'): 'Client of ${.url}' ) ) Config(<class 'chanfig.config.Config'>, ('server'): Config(<class 'chanfig.config.Config'>, ('host'): 'localhost' ('port'): 80 ) ('client'): Config(<class 'chanfig.config.Config'>, ('url'): 'http://${server.host}:${server.port}/' <--- ERROR, expected 'http://localhost:80/' ('server_port'): 80 <--- Fine ('description'): 'Client of ${.url}' <--- ERROR, expected 'Client of http://localhost:80/' ) )
We can find only
client.server_port
is interpolated, andclient.url
andclient.description
are not as expected.Given the example before, we may need to find all value that contains
${}
, rather than equal to${}
. In your implementation, the variable interpolation's functionality is strongly limited (why I don't just useanchor
andalias
?). Given that values need to be interpolated, a parse tree or recursively implementation may be involved to resolve the true value. When implementing this, please pay attention to allowing escape characters, e.g.\${}
won't be interpolated.Besides, I am also thinking about how to maximize the performance of variance interpolation. A possible solution is we can predefine some
interpolates
and the user can select which to use. And we need to expose theinterpolate
function toConfig.from_yaml
.E.g.:
from chanfig import Config from chanfig.interpolators import ( DefaultInterpolator, EnvInterpolator, LambdaInterpolator, LambdaRegexInterpolator, ) config = Config.from_yaml(yaml) # would use DefaultInterpolator config = Config.from_yaml( yaml, interpolate=[DefaultInterpolator, EnvInterpolator] ) # would handle default and ${Env.xxx} with environment variables config = Config.from_yaml( yaml, interpolate=[ DefaultInterpolator, EnvInterpolator, LambdaInterpolator(cond, mapping), ], ) # would handle default, env and user customed interpolator
By the way, I think when people use the
${}
in yaml, they mostly expect variable interpolation rather than the raw value. So the interpolate can be set to default behavior.
Thank you for your feedback! I'll work on it later next week.
Another thing, I can only install the repo with
pip install git+https://github.com/ZhiyuanChen/CHANfiG
. When I triedgit clone https://github.com/ZhiyuanChen/CHANfiG && cd CHANfiG
, and thenpython setup.py develop
, it fails withModuleNotFoundError: No module named 'chanfig._version'
. I am not familiar with package distribution, can you fix this?
Have you tried to install with pip install -e .
? Running setup.py
is deprecated and may resulted in unexpected behavior.
One more thing, when use
Config.from_yaml()
to load a non-existent file, it would raise the error:Traceback (most recent call last): File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/site-packages/chanfig/flat_dict.py", line 1174, in open file = open(file, *args, encoding=encoding, **kwargs) # type: ignore # noqa: SIM115 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'config.yaml' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/xxx/Documents/tmp/test.py", line 7, in <module> config = Config.from_yaml('config.yaml') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/site-packages/chanfig/flat_dict.py", line 1090, in from_yaml with cls.open(file) as fp: # pylint: disable=C0103 File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/contextlib.py", line 137, in __enter__ return next(self.gen) ^^^^^^^^^^^^^^ File "/Users/xxx/anaconda3/envs/dev/lib/python3.11/site-packages/chanfig/flat_dict.py", line 1177, in open file.close() # type: ignore ^^^^^^^^^^ AttributeError: 'str' object has no attribute 'close'
The second exception is not expected, we can modify here with:
try: file = open(file, *args, encoding=encoding, **kwargs) # type: ignore # noqa: SIM115 yield file # type: ignore except: raise else: file.close() # type: ignore
Thanks for reporting this. The finally is meant to catch any possible error when opening file and ensure it's closed, to avoid memory leaks (In some cases, the FileNotFoundError
may be catched). So it must remains in the finally clause. Although the exception in attempting to close the file should be suppressed. I have just pushed a fix for this.
Running pip install -e .
fix it, thanks!
Would the memory leakage occur even if the file is closed?
from contextlib import contextmanager
class X:
def __init__():
pass
@classmethod
@contextmanager
def open(self, filename):
try:
f = open(filename, 'r')
yield f
except:
raise
else:
f.close()
with X.open("test.yaml") as f:
print(f.read())
print(f.closed)
Output:
server:
host: localhost
port: 80
client:
url: http://${server.host}:${server.port}/
server_port: ${server.port}
description: Client of ${.url}
True
Would the memory leakage occur even if the file is closed?
No, but if open successfully opened a file handle, but also raise an exception, it will goes to except: raise
clause. And if this Exception is handled elsewhere, the file would remain opens, leading to memory leakage.
However, I found the interpolation only takes effect when the value is ${}, as this line does. It is just like anchor/alias in yaml, and cannot handle things like "http://${server.host}:${server.port}".
Current implementation (at develop branch) has support this feature.
Although there are some drawbacks, since Variable
computes eagerly, url=http://${server.host}:${server.port}
will not be able to save to a Variable
object, meaning that if the value of server.host
or server.port
changes, url
will not change accordingly.
Besides, the interpolation is not raising error with circular interpolation.
Also, I noticed that you are creating an IO to load string.
if __name__ == '__main__':
yaml = io.StringIO(yaml)
config = Config.from_yaml(yaml)
print(config)
print(config.interpolate())
Did you know you can do it by calling from_yamls
?
if __name__ == '__main__':
config = Config.from_yamls(yaml)
print(config)
print(config.interpolate())
Besides, the interpolation is not raising error with circular interpolation.
Just fixed this.
No, but if open successfully opened a file handle, but also raise an exception, it will goes to
except: raise
clause. And if this Exception is handled elsewhere, the file would remain opens, leading to memory leakage.
OK, though i think suppress all exception is a little dangerous, I can accept it for now.
Current implementation (at develop branch) has support this feature.
Great! Thank you.
Although there are some drawbacks, since Variable computes eagerly, url=http://${server.host}:${server.port} will not be able to save to a Variable object, meaning that if the value of server.host or server.port changes, url will not change accordingly.
It would be better if we could solve this issue. Can we compute the interpolation lazily when we need it?
Did you know you can do it by calling from_yamls?
Thank you for your advice. It's really better.
By the way, would you plan to implement move the interpolate
to default behavior or implement pre-defined interpolators, e.g. ${ENV.local_rank}
or ${import ./model.yaml}
?
Got errors when using interpolate
with parse
.
OK, though i think suppress all exception is a little dangerous, I can accept it for now.
Don't worries, it will only supress all exception when closing the file. It will still arise Exception
in opening a file.
It would be better if we could solve this issue. Can we compute the interpolation lazily when we need it?
I have thought about it. I don't think it's a good idea to support lazy interpolation. There are two main reasons:
dict member access is extremely fast in chanfig
, at least twice faster in FlatDict
and ~10 times faster compare to yacs
, ml_collections
and addict
.
Variable interpolation, on the other hand, is very very complex. We use regex to identify all the placeholders (wait, this is a better name that interpolatee, which is not even a word), this takes O(k) time. Then, we identify if it points to it self, this also takes O(k) time. Next, we check if there exists circular reference, which takes O(n log n) time. They will have significatn impact on the performance of chanfig
(and chanfig
was designed for extreme use case, where constant terms matter).
Lazy interpolation will have "master member" and "slave member". Change the value of "mater member" will also change the value of "slave member", but not the other way. Although I don't care about PC, I mean, I'm a Chineses, my ancestors did nothing to black friends. But this is against my philosophy. Every member should be equal.
To address this issue, in the later version of chanfig
, we will introduce a new update on Variable (or maybe a brand new object). This Object will be more similar to FieldReference
which supports variable level lazy evaluation.
By the way, would you plan to implement move the interpolate to default behavior or implement pre-defined interpolators, e.g. ${ENV.local_rank} or ${import ./model.yaml}?
Hmmmm, I think I'll go the first way, as pre-defined interpolator is comparably less intuitive.
For the first one, I'll add another parameter to allow read environment variable more easily.
Currently, interpolate()
accepts an interpolator
parameter. So, you could call interpolate(chanfig.load("model.yaml"))
. Or, could you provide more example usage?
Got errors when using
interpolate
withparse
.
Just fixed this!
sorry for the late response, I have just tested with the newest develop
branch and found an issue: elements in the list of dict won't be interpolated.
To address this issue, in the later version of chanfig, we will introduce a new update on Variable (or maybe a brand new object). This Object will be more similar to FieldReference which supports variable level lazy evaluation.
I think it is OK. Thank you for your efforts.
Currently, interpolate() accepts an interpolator parameter. So, you could call interpolate(chanfig.load("model.yaml")). Or, could you provide more example usage?
But we cannot write "load model.yaml" in yaml config and load it correctly right? When I am writing configurations for my experiments, I faced this issue, since a model's parameters are large (refer to transformers
or diffusers
) and relatively fixed, and dataset's parameters can be reused across many experiments, I decided to separate these configuration files. But without grammar like ${import xx.yaml}$
, I turned out to write a python config, as shown in #16 .
sorry for the late response, I have just tested with the newest
develop
branch and found an issue: elements in the list of dict won't be interpolated.
Copy, will work on it later, probably next week.
But we cannot write "load model.yaml" in yaml config and load it correctly right? When I am writing configurations for my experiments, I faced this issue, since a model's parameters are large (refer to
transformers
ordiffusers
) and relatively fixed, and dataset's parameters can be reused across many experiments, I decided to separate these configuration files. But without grammar like${import xx.yaml}$
, I turned out to write a python config, as shown in #16 .
Ahhh, I understand your needs.
For combining multiple yaml file, this is a very solid and sound needs. We will be tracking this in a new issue
sorry for the late response, I have just tested with the newest
develop
branch and found an issue: elements in the list of dict won't be interpolated.
Could you provide a minimum working example?
I have the following yaml, but it is not parsable by pyyaml.
a: 1
b: ${a}
c: [${a}, ${b}]
Yes, I tried #16 and use it as the current implementation. While compare to yaml config, python config are too long...
When trying to create the minimum working example, I use the latest dev branch 0.0.88.dev22+g6e303f9
, and got an error TypeError: 'Config' object is not callable
when calling interpolate
, does the interface changed?
Yes, I tried #16 and use it as the current implementation. While compare to yaml config, python config are too long...
I'm working on an api like dataclass
, which should improve it a bit.
When trying to create the minimum working example, I use the latest dev branch
0.0.88.dev22+g6e303f9
, and got an errorTypeError: 'Config' object is not callable
when callinginterpolate
, does the interface changed?
Could you provide a minimum example for this? I don't think there was anything change in interpolate
.
Our server's power supply was flooded, here's a possible example:
data_dir: /home/data
# val_data_dir: /home/data_eval
# data
data_args:
data_type: "any"
data_dict:
imagenet:
data_dirs:
- ${data_dir}/X-A
- ${data_dir}/X-B
- ${data_dir}/X-C
This is much harder than it appears to be. Could you try the latest develop branch and see if it works as you expected?
Hi, I just found there exists an exciting and very useful feature "variable interpolation" provided by
OmegaConf
.Detailed documents can be found here. And here is a comparison:
In short, it can interpolate values that are annotated with
${}
in lazy mode. I think this reference is usable when creating configurations from yaml, just likeVariable
inchanfig
.How do you like it?
Best!