facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.66k stars 623 forks source link

Allow adding help messages to command line arguments #633

Open JamieJamesJamie opened 4 years ago

JamieJamesJamie commented 4 years ago

🚀 Feature Request

It would be nice to be able to add help messages (similar to the argparse library) to command line arguments defined in .yaml configs / structured configs.

These help messages should be available to view conveniently when typing python myapp.py --help into the command line so that the program's user doesn't have to look through the config files and / or the rest of the code to figure out what some command line arguments do.

Motivation

Is your feature request related to a problem? Please describe.

It would be nice if users of programs written with hydra could see helpful messages for command line arguments.

It's already available as a feature in Python's argparse library by passing a help parameter containing a help message to parser.add_argument() as demonstrated here.

Adding the ability to add help messages to command line arguments in hydra may help to improve a user's understanding of what a program does.

Pitch

Describe the solution you'd like

The solution should:

The solution could also optionally have the following features:

Describe alternatives you've considered

I have a few suggestions on how this could be implemented (1-4 below) in order of increasing plausibility. Suggestions 1-2 have possible solutions for both .yaml files and structured configs. Suggestions 3-4 have better suggestions for structured configs.

I feel like suggestion 4 might be the best for structured configs. I'm not sure if help messages could be defined better in .yaml files.

1. How it could be done currently without any changes

At the moment, I understand that help messages for hydra can be configured as per this tutorial (repeated below):

hydra:
  help:
    # App name, override to match the name your app is known by
    app_name: ${hydra.job.name}

    # Help header, customize to describe your app to your users
    header: |
      ${hydra.help.app_name} is powered by Hydra.

    footer: |
      Powered by Hydra (https://hydra.cc)
      Use --hydra-help to view Hydra specific help

    # Basic Hydra flags:
    #   $FLAGS_HELP
    #
    # Config groups, choose one of:
    #   $APP_CONFIG_GROUPS: All config groups that does not start with hydra/.
    #   $HYDRA_CONFIG_GROUPS: All the Hydra config groups (starts with hydra/)
    #
    # Configuration generated with overrides:
    #   $CONFIG : Generated config
    #
    template: |
      ${hydra.help.header}
      == Configuration groups ==
      Compose your configuration from those groups (group=option)

      $APP_CONFIG_GROUPS

      == Config ==
      Override anything in the config (foo.bar=value)

      $CONFIG

      ${hydra.help.footer}

e.g. It's possible to add help messages in manually by overriding hydra.help.template as needed. This could be done by manually typing the desired result of python my_app.py --help. But, this seems counterproductive (e.g. what if something is updated in db.mysql in the tutorial's example? It's possible to easily forget to update any manual help messages in hydra.help.template!).

Instead, perhaps help messages could be defined at the same time and place as the associated config attributes. So, here's my next alternative suggestion:

2. Parse comments in .yaml files / structured configs

As inspiration for this alternative example, I happened to come across the next Hydra version's doc site. I found a section on the "Nevergrad Sweeper Plugin" example here.

The comments in the example (kind of) felt like I could be reading the help text that would appear for this config when using python my_app.py --help. So, a potential quick workaround could be to parse comments directly above a config attribute as help text.

Example of using this for a config.yaml file (adapted from the Hydra tutorial website):

# Group to decide what type of database to use
db:
  # Driver to use for the database
  driver: mysql
  # Username for the database
  user: omry
  # Password for the database
  pass: secret

Example of using this for structured configs (adapted from the Hydra tutorial website):

@dataclass
class MySQLConfig:
    # Driver to use for the database
    driver: str = "mysql"
    # Username for the database
    user: str = "omry"
    # Password for the database
    pass: str = "secret"

@dataclass
class Config(DictConfig):
    # Group to decide what type of database to use
    db: MySQLConfig = MISSING

When a user types python myapp.py --help, I'd expect the following help message to appear (adapted from the Hydra tutorial website):

my_app is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)

db: mysql - Group to decide what type of database to use

== Config ==
Override anything in the config (foo.bar=value)

db:                                 - Group to decide what type of database to use
  driver: mysql                - Driver to use for the database
  user: omry                   - Username for the database
  pass: secret                  - Password for the database

Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help

(Sorry the indentation is all weird on the help text - this seems to be a "feature" of GitHub)

This method, although easy to implement, feels very "hacky". Especially for the structured configs as it's just Python comments that I've used in this alternative suggestion.

3. Define a @dataclass helper function

Perhaps, in the case of structured configs at least, a helper function could be used to define help text. e.g.:

@dataclass
class MySQLConfig:
    driver: str = "mysql"
    user: str = "omry"
    pass: str = "secret"

    def help(self) -> Dict[str, str]:
        helpDict = {
                            "driver": "Driver to use for the database",
                            "user": "Username for the database",
                            "pass": "Password for the database"
        }
        return helpDict

@dataclass
class Config(DictConfig):
    db: MySQLConfig = MISSING

    def help(self) -> Dict[str, str]:
        helpDict = {"db": "Group to decide what type of database to use"}
        return helpDict

This could work if you define that help() should be a method defined by structured configs to add help messages.

However, this has the disadvantage that if the structured config class is large, then it would get hard to check if every attribute that you want to have a help message has a help message in helpDict.

This should output the same help message as in the previous suggestion.

4. Using @dataclass's metadata parameter

The dataclasses.field() method has a metadata parameter. The definition provided by the Python docs is:

metadata: This can be a mapping or None. None is treated as an empty dict. This value is wrapped in MappingProxyType() to make it read-only, and exposed on the Field object. It is not used at all by Data Classes, and is provided as a third-party extension mechanism. Multiple third-parties can each have their own key, to use as a namespace in the metadata.

This seems like the best solution to implement (for structured configs at least) for the following reasons:

An example of how this could be used is to suggest to a Hydra developer to use the keyword "help" as a metadata dictionary key to a help message for each attribute that the developer wants to assign a help message.

@dataclass
class MySQLConfig:
    driver: str = field(default="mysql", metadata={"help", "Driver to use for the database"})
    user: str = field(default="omry", metadata={"help", "Username for the database"})
    pass: str = field(default="secret", metadata={"help", "Password for the database"})

@dataclass
class Config(DictConfig):
    db: MySQLConfig = field(default=MISSING, metadata={"help", "Group to decide what type of database to use"})

Again, this should output the same help message as in the previous suggestion.

I'm sorry that this feature request is so long, I just kept thinking of alternative ways the solution could possibly be implemented.

That being said, of course, none of these suggested implementations are necessarily the best - these are just suggestions! 😄

If something's not clear please let me know!

Are you willing to open a pull request? (See CONTRIBUTING)

No, sorry. 😖

Additional context

Add any other context or screenshots about the feature request here.

omry commented 4 years ago

Hi, Thanks for the suggestion. This is on my radar, but definitely not trivial.

  1. Parse comments in .yaml files / structured configs A potential quick workaround could be to parse comments directly above a config attribute as help text.

YAML Parsers are stripping the comments. This is not a potentially quick workaround but one that would require writing a new YAML parser. Which is at best a multi-month effort. (I know that because I tried).

    1. Define a @dataclass helper function

I don't like this one for the reasons you stated.

  1. Using @dataclass metadata parameter

This is getting close. I actually already have an open issue in OmegaConf to consider adding support something similar based on pep-593 (and in fact something more powerful than just doc strings).

This will have to be implemented in OmegaConf before Hydra can support it.

SigmaX commented 3 years ago

Just chiming in with support for this feature. Help messages & a self-documenting CLI are my favorite feature of click & argparse, so the lack of them is a source of hesitation when considering Hydra for projects (along with the inability to use Hydra and click/argparse together in the same program #409 , which would be a reasonable compromise in many situations).

addisonklinke commented 3 years ago

I agree with @SigmaX that help messages seem to be the only feature I can think of where Hydra/OmegaConf are lacking in comparison to argparse. Other than that and refactoring efforts, it would be hard for developers to justify not switching!

tmke8 commented 3 years ago

Another suggestion (similar to suggestions 2 and 3 in some ways):

5. Using the regular docstring in the structured config class

(The docstring format I used is based on variant 3 from this stackoverflow answer on how to write docstrings for dataclasses, but you could also use variant 2.)

@dataclass
class MySQLConfig:
    """Configuration for MySQL.

    Docstring in Google style, but PEP style or numpy style is of course
    also possible.

    Attributes:
        driver: Driver to use for the database.
        user: Username for the database. The documentation for this runs
            over multiple lines.
        password: Password for the database.
    """
    driver: str = "mysql"
    user: str = "omry"
    password: str = "secret"

(Users can use a tool like darglint to automatically check whether docstring and implementation are in sync.)

Advantages:

Disadvantages:

oliver-batchelor commented 3 years ago

simple-parsing - https://github.com/lebrice/SimpleParsing (an argparse library using dataclass) supports help well.

It uses both docstring and field comments like this:

@dataclass
class Foo:
     """ This is some help for the whole of Foo"""
    someOption : Bool = False   # This turns the option on
    blah : str # This is some more help for the blah
bonlime commented 3 years ago

Probably parsing help from comments in dataclass would be the easiest option. It could be done as suggested above: help comment line is written at the same line as parameter or it could be a separate line above the parameter which starts with something like # (help): to avoid parsing other unrelated comments

Algomorph commented 2 years ago

I don't know whether to open a separate thread/issue for this just yet, but just wanted to bring to everyone's attention. It would be great to support outputting the CLI help strings for each parameter / CLI argument inside the configuration YAML files as well, as comments. My little config/CLI library here supports this by using the ruamel.yaml backend, but it is cumbersome in other ways / lacking other features when compared to hydra. It would be nice to see those features in hydra so I don't have to replicate hydra devs' efforts by upgrading my own library.

Here's what I'm talking about: image

Those same help strings are available in the manual, when the program is run with --help on the CLI.

The whole idea here is to provide the users of the config YAML files with a reference for the parameters, which can be especially handy for running highly-complex models.

omry commented 1 year ago

@Algomorph, I considered ruamel.yaml at some point exactly because it preserves the comments, but I found it pretty terrible to use and I remember it having additional issues. I think it's reasonable to depend on dataclasses to provide this functionality SimpleParsing is doing (which is if I remember correctly also a hack but not too terrible).

javidcf commented 1 year ago

A couple more alternatives I can think of. Not saying they are necessarily better, just posting for the sake of discussion. Since I do not use structured configs, I would like something that works just with the YAML, so:

Special key syntax for help

This can take many forms, but in short it would be some special syntax in the config key that signifies that the contained value is a docstring for some option. For example, option? for the docstring of option. With the example posted above:

db?: Group to decide what type of database to use
db:
  driver?: Driver to use for the database
  driver: mysql
  user?: Username for the database
  user: omry
  pass?: Password for the database
  pass: secret

You could even document the whole package with ?: <top-level doc>. Alternative syntaxes could be help(option), option.__doc__, etc. The disadvantage is that you have to repeat the field name (though Hydra could warn if a docstring for a mistyped option name is detected). Also, might be awkward or more difficult to use in defaults.

Generate documentation template

This would simply be a tool in the library to generate a separate document (maybe another YAML) with all the available configuration keys for the developer to add the documentation strings. It could be a single file for the whole app configuration (which may be tricky considering how dynamic Hydra configurations can be) or multiple files for different packages. So, for the config.yaml above, you would get, for example, a config.doc.yaml like:

db:
  driver:
  user:
  pass:

Which you would fill like:

db: Group to decide what type of database to use
  driver: Driver to use for the database
  user: Username for the database
  pass: Password for the database

Then, if you add some new configuration item to db, the documentation generator should be able to keep the existing documentation and add new empty keys as needed:

db: Group to decide what type of database to use
  driver: Driver to use for the database
  user: Username for the database
  pass: Password for the database
  new_item:

Hydra would also need to implement the ability to read these documentation files and incorporate their contents into the help template.

mmcdermott commented 1 month ago

I think this utility would work very well. In the interim, in case it is helpful to others, I have a solution I implemented where I have a custom resolver in OmegaConf that uses inspect to pull the main() function docstring for the script being run and adds that to the documentation for a Hydra app. It doesn't pull documentation from the yaml files (though I think that could be configured as well), but it might be helpful in the interim.

Source project with this capability: https://github.com/mmcdermott/MEDS_transforms/tree/main Custom resolvers colde: https://github.com/mmcdermott/MEDS_transforms/blob/main/src/MEDS_transforms/utils.py#L111C1-L135C1

Hydra help template: https://github.com/mmcdermott/MEDS_transforms/blob/main/src/MEDS_transforms/configs/pipeline.yaml#L51

More generally, I think having resolvers to pull some of these things (function or structured config docstrings, yaml comments, anything) would be a great intermediate solution here.

Also, if the resolvers I have currently implemented in the example above would be useful to other folks, I'd be happy to pull them out into a separate installable package so they can be used more easily.

mmcdermott commented 4 weeks ago

Related, for my note about resolvers: https://github.com/facebookresearch/hydra/issues/2944

godaup commented 3 weeks ago

Another interim solution in case one wants to keep the help messages within the yaml files and uses sphinx to build documentation anyway would be autoyaml.

Disadvantages:

Advantages:

Here is what it looks like: screenshot_yaml screenshot_documentation

mmcdermott commented 3 weeks ago

@godaup I think that would also be great, but I do also care about things showing up in the --help message so that CLI interfaces can be descriptive and usable (especially in contexts where the package can be installed, but the server on which it is used has no network access as is common in medical settings or settings with other sensitive data, and therefore online documentation can't be as reliably used). Perhaps we should start two issues, one for how to document hydra applications in sphinx or mkdocs outputs, and this one for CLI help messages (though of course ideally both would use similar underlying technology or shared help messages).