Open JamieJamesJamie opened 4 years ago
Hi, Thanks for the suggestion. This is on my radar, but definitely not trivial.
- Parse comments in .yaml files / structured configs A potential quick workaround could be to parse comments directly above a config attribute as help text.
YAML Parsers are stripping the comments. This is not a potentially quick workaround but one that would require writing a new YAML parser. Which is at best a multi-month effort. (I know that because I tried).
- Define a
@dataclass
helper function
I don't like this one for the reasons you stated.
- Using
@dataclass
metadata parameter
This is getting close. I actually already have an open issue in OmegaConf to consider adding support something similar based on pep-593 (and in fact something more powerful than just doc strings).
This will have to be implemented in OmegaConf before Hydra can support it.
Just chiming in with support for this feature. Help messages & a self-documenting CLI are my favorite feature of click
& argparse
, so the lack of them is a source of hesitation when considering Hydra for projects (along with the inability to use Hydra and click
/argparse
together in the same program #409 , which would be a reasonable compromise in many situations).
I agree with @SigmaX that help messages seem to be the only feature I can think of where Hydra/OmegaConf are lacking in comparison to argparse. Other than that and refactoring efforts, it would be hard for developers to justify not switching!
Another suggestion (similar to suggestions 2 and 3 in some ways):
(The docstring format I used is based on variant 3 from this stackoverflow answer on how to write docstrings for dataclasses, but you could also use variant 2.)
@dataclass
class MySQLConfig:
"""Configuration for MySQL.
Docstring in Google style, but PEP style or numpy style is of course
also possible.
Attributes:
driver: Driver to use for the database.
user: Username for the database. The documentation for this runs
over multiple lines.
password: Password for the database.
"""
driver: str = "mysql"
user: str = "omry"
password: str = "secret"
(Users can use a tool like darglint to automatically check whether docstring and implementation are in sync.)
Advantages:
MySQLConfig.__doc__
(you'd need to parse the docstring itself, but that seems pretty easy; e.g., use docstring_parser)Disadvantages:
simple-parsing - https://github.com/lebrice/SimpleParsing (an argparse library using dataclass) supports help well.
It uses both docstring and field comments like this:
@dataclass
class Foo:
""" This is some help for the whole of Foo"""
someOption : Bool = False # This turns the option on
blah : str # This is some more help for the blah
Probably parsing help from comments in dataclass would be the easiest option. It could be done as suggested above: help comment line is written at the same line as parameter or it could be a separate line above the parameter which starts with something like # (help):
to avoid parsing other unrelated comments
I don't know whether to open a separate thread/issue for this just yet, but just wanted to bring to everyone's attention. It would be great to support outputting the CLI help strings for each parameter / CLI argument inside the configuration YAML
files as well, as comments. My little config/CLI library here supports this by using the ruamel.yaml
backend, but it is cumbersome in other ways / lacking other features when compared to hydra
. It would be nice to see those features in hydra
so I don't have to replicate hydra
devs' efforts by upgrading my own library.
Here's what I'm talking about:
Those same help strings are available in the manual, when the program is run with --help
on the CLI.
The whole idea here is to provide the users of the config YAML
files with a reference for the parameters, which can be especially handy for running highly-complex models.
@Algomorph, I considered ruamel.yaml at some point exactly because it preserves the comments, but I found it pretty terrible to use and I remember it having additional issues. I think it's reasonable to depend on dataclasses to provide this functionality SimpleParsing is doing (which is if I remember correctly also a hack but not too terrible).
A couple more alternatives I can think of. Not saying they are necessarily better, just posting for the sake of discussion. Since I do not use structured configs, I would like something that works just with the YAML, so:
This can take many forms, but in short it would be some special syntax in the config key that signifies that the contained value is a docstring for some option. For example, option?
for the docstring of option
. With the example posted above:
db?: Group to decide what type of database to use
db:
driver?: Driver to use for the database
driver: mysql
user?: Username for the database
user: omry
pass?: Password for the database
pass: secret
You could even document the whole package with ?: <top-level doc>
. Alternative syntaxes could be help(option)
, option.__doc__
, etc. The disadvantage is that you have to repeat the field name (though Hydra could warn if a docstring for a mistyped option name is detected). Also, might be awkward or more difficult to use in defaults
.
This would simply be a tool in the library to generate a separate document (maybe another YAML) with all the available configuration keys for the developer to add the documentation strings. It could be a single file for the whole app configuration (which may be tricky considering how dynamic Hydra configurations can be) or multiple files for different packages. So, for the config.yaml
above, you would get, for example, a config.doc.yaml
like:
db:
driver:
user:
pass:
Which you would fill like:
db: Group to decide what type of database to use
driver: Driver to use for the database
user: Username for the database
pass: Password for the database
Then, if you add some new configuration item to db
, the documentation generator should be able to keep the existing documentation and add new empty keys as needed:
db: Group to decide what type of database to use
driver: Driver to use for the database
user: Username for the database
pass: Password for the database
new_item:
Hydra would also need to implement the ability to read these documentation files and incorporate their contents into the help template.
I think this utility would work very well. In the interim, in case it is helpful to others, I have a solution I implemented where I have a custom resolver in OmegaConf that uses inspect to pull the main() function docstring for the script being run and adds that to the documentation for a Hydra app. It doesn't pull documentation from the yaml files (though I think that could be configured as well), but it might be helpful in the interim.
Source project with this capability: https://github.com/mmcdermott/MEDS_transforms/tree/main Custom resolvers colde: https://github.com/mmcdermott/MEDS_transforms/blob/main/src/MEDS_transforms/utils.py#L111C1-L135C1
Hydra help template: https://github.com/mmcdermott/MEDS_transforms/blob/main/src/MEDS_transforms/configs/pipeline.yaml#L51
More generally, I think having resolvers to pull some of these things (function or structured config docstrings, yaml comments, anything) would be a great intermediate solution here.
Also, if the resolvers I have currently implemented in the example above would be useful to other folks, I'd be happy to pull them out into a separate installable package so they can be used more easily.
Related, for my note about resolvers: https://github.com/facebookresearch/hydra/issues/2944
Another interim solution in case one wants to keep the help messages within the yaml files and uses sphinx to build documentation anyway would be autoyaml.
Disadvantages:
--help
message (documentation pages may be linked though)Advantages:
key_help
entriesHere is what it looks like:
@godaup I think that would also be great, but I do also care about things showing up in the --help
message so that CLI interfaces can be descriptive and usable (especially in contexts where the package can be installed, but the server on which it is used has no network access as is common in medical settings or settings with other sensitive data, and therefore online documentation can't be as reliably used). Perhaps we should start two issues, one for how to document hydra applications in sphinx or mkdocs outputs, and this one for CLI help messages (though of course ideally both would use similar underlying technology or shared help messages).
🚀 Feature Request
It would be nice to be able to add help messages (similar to the argparse library) to command line arguments defined in .yaml configs / structured configs.
These help messages should be available to view conveniently when typing
python myapp.py --help
into the command line so that the program's user doesn't have to look through the config files and / or the rest of the code to figure out what some command line arguments do.Motivation
Is your feature request related to a problem? Please describe.
It would be nice if users of programs written with hydra could see helpful messages for command line arguments.
It's already available as a feature in Python's
argparse
library by passing ahelp
parameter containing a help message toparser.add_argument()
as demonstrated here.Adding the ability to add help messages to command line arguments in hydra may help to improve a user's understanding of what a program does.
Pitch
Describe the solution you'd like
The solution should:
Allow a developer using Hydra to define a help message for each command line argument and each group defined in .yaml files or structured configs.
Allow an end user of a program written with Hydra to see what command line arguments can be passed as well as a description associated with each of them.
The solution could also optionally have the following features:
It would also be nice if the defaults for each attribute were optionally shown to the user in the help message. An
argparse
equivalent is shown here.It would also be nice if the type of each attribute were optionally shown in the help message (if known, such as when using structured configs). An
argparse
equivalent is shown here.Describe alternatives you've considered
I have a few suggestions on how this could be implemented (1-4 below) in order of increasing plausibility. Suggestions 1-2 have possible solutions for both .yaml files and structured configs. Suggestions 3-4 have better suggestions for structured configs.
I feel like suggestion 4 might be the best for structured configs. I'm not sure if help messages could be defined better in .yaml files.
1. How it could be done currently without any changes
At the moment, I understand that help messages for hydra can be configured as per this tutorial (repeated below):
e.g. It's possible to add help messages in manually by overriding
hydra.help.template
as needed. This could be done by manually typing the desired result ofpython my_app.py --help
. But, this seems counterproductive (e.g. what if something is updated in db.mysql in the tutorial's example? It's possible to easily forget to update any manual help messages inhydra.help.template
!).Instead, perhaps help messages could be defined at the same time and place as the associated config attributes. So, here's my next alternative suggestion:
2. Parse comments in .yaml files / structured configs
As inspiration for this alternative example, I happened to come across the next Hydra version's doc site. I found a section on the "Nevergrad Sweeper Plugin" example here.
The comments in the example (kind of) felt like I could be reading the help text that would appear for this config when using
python my_app.py --help
. So, a potential quick workaround could be to parse comments directly above a config attribute as help text.Example of using this for a config.yaml file (adapted from the Hydra tutorial website):
Example of using this for structured configs (adapted from the Hydra tutorial website):
When a user types
python myapp.py --help
, I'd expect the following help message to appear (adapted from the Hydra tutorial website):(Sorry the indentation is all weird on the help text - this seems to be a "feature" of GitHub)
This method, although easy to implement, feels very "hacky". Especially for the structured configs as it's just Python comments that I've used in this alternative suggestion.
3. Define a
@dataclass
helper functionPerhaps, in the case of structured configs at least, a helper function could be used to define help text. e.g.:
This could work if you define that
help()
should be a method defined by structured configs to add help messages.However, this has the disadvantage that if the structured config class is large, then it would get hard to check if every attribute that you want to have a help message has a help message in
helpDict
.This should output the same help message as in the previous suggestion.
4. Using
@dataclass
'smetadata
parameterThe
dataclasses.field()
method has ametadata
parameter. The definition provided by the Python docs is:This seems like the best solution to implement (for structured configs at least) for the following reasons:
metadata
is read-onlyEach attribute's help message is defined at the location the attribute is defined which makes it easy to check if a help message has been written for a given attribute.
"
metadata
is provided as a third-party extension mechanism" - This is exactly the use case!It's less lines of code compared to my other structured config examples (see below)
An example of how this could be used is to suggest to a Hydra developer to use the keyword
"help"
as a metadata dictionary key to a help message for each attribute that the developer wants to assign a help message.Again, this should output the same help message as in the previous suggestion.
I'm sorry that this feature request is so long, I just kept thinking of alternative ways the solution could possibly be implemented.
That being said, of course, none of these suggested implementations are necessarily the best - these are just suggestions! 😄
If something's not clear please let me know!
Are you willing to open a pull request? (See CONTRIBUTING)
No, sorry. 😖
Additional context
Add any other context or screenshots about the feature request here.