anntzer / defopt

Effortless argument parser
https://pypi.org/project/defopt/
MIT License
213 stars 11 forks source link

Consume command line arguments in 2 calls - help text has only one call's parameters #119

Closed mcarans closed 1 year ago

mcarans commented 2 years ago

I previously wrote about consuming command line arguments in 2 function calls (https://github.com/anntzer/defopt/issues/107) and you kindly added functionality for that.

I have noticed one issue. When a wrong command line parameter is added and the help text is displayed, it only shows the parameters from the function given in defopt.run. It needs to show the combination of the parameters from defopt.bind_known and defopt.run eg. from projectmainfn and _create_configuration in the example below.

func, argv = defopt.bind_known(projectmainfn, cli_options="all")
site_url = defopt.run(_create_configuration, argv=argv, cli_options="all")
func()

How could this be achieved?

anntzer commented 2 years ago

I understand the problem (I think), but this seems rather tricky to fix (even if you override the formatting functions at https://docs.python.org/3/library/argparse.html#printing-help, you'd need to basically reimplement all the logic yourself, which seems quite a mouthful to do?). Or can you think of anything?

mcarans commented 2 years ago

Could bind_known return information on the arguments it has managed to bind (say parsed_args or even help_string) which could be passed to run to add to the help output?

Alternatively, could run accept an additional parameter, something like bind_known=projectmainfn and then produce the combined help output?

anntzer commented 2 years ago

In your suggested APIs, how do you propose to implement the help string generation?

mcarans commented 2 years ago

I created a new function based on the 2 functions programmatically using makefun. The new function has the signature and docstring of the 2 functions and is used purely to check the binding. The only thing is I had to use the internal function defopt._parse_docstring:

    create_config_sig = signature(_create_configuration)
    create_config_params = list(create_config_sig.parameters.values())
    main_sig = signature(projectmainfn)
    main_params = list(main_sig.parameters.values())
    main_params.extend(create_config_params)
    main_sig = main_sig.replace(parameters=main_params)

    create_config_doc = getdoc(_create_configuration)
    parsed_main_doc = defopt._parse_docstring(getdoc(projectmainfn))
    main_doc = [f"{parsed_main_doc.first_line}\n\nArgs:"]
    for param_name, param_info in parsed_main_doc.params.items():
        main_doc.append(
            f"\n    {param_name} ({param_info.type}): {param_info.text}"
        )
    args_index = create_config_doc.index("Args:")
    args_doc = create_config_doc[args_index + 5 :]
    main_doc.append(args_doc)
    main_doc = "".join(main_doc)

    @with_signature(main_sig, func_name=projectmainfn.__name__)
    def gen_func(*args, **kwargs):
        """docstring"""
        return args, kwargs

    gen_func.__doc__ = main_doc

    argv = sys.argv[1:]
    for key in kwargs:
        name = f"--{key.replace('_', '-')}"
        if name not in argv:
            argv.append(name)
            argv.append(kwargs[key])
    defopt.bind(gen_func, argv=argv, cli_options="all")

    func, argv = defopt.bind_known(projectmainfn, argv=argv, cli_options="all")
    site_url = defopt.run(_create_configuration, argv=argv, cli_options="all")
    func()

The above also allows addition of arguments passed to the function as **kwargs to those from the command line sys.argv.

anntzer commented 2 years ago

I'm not really sure I want to implement something that complicated directly into defopt (e.g. I don't want to have to decide how to combine the first_lines of the two functions' docstrings, or all kinds of similar design questions), but probably a good idea would be to also expose the parsed function docstring in the enhanced callable returned by defopt.signature (e.g. as a .doc property on a Signature subclass, as I did for Parameter); this should allow you to do what you did but without going through private API. Thoughts?

mcarans commented 2 years ago

Do you mean that these lines

main_sig = signature(projectmainfn)
parsed_main_doc = defopt._parse_docstring(getdoc(projectmainfn))

would become something like the lines below?

main_sig = defopt.signature(projectmainfn)
parsed_main_doc = main_sig.doc

If so, I think that would be helpful.

anntzer commented 2 years ago

Yes. I probably won't be able to work on this in the coming few days, but will keep that on my todo.

mcarans commented 2 years ago

It would be useful to also be able to call _parse_docstring directly (eg. as parse_docstring) although I understand if you'd prefer to only make the dosctring available through the signature call.

anntzer commented 1 year ago

I implemented and pushed defopt.signature(func).doc as discussed above; let me know if that's good enough for now.

mcarans commented 1 year ago

Thanks for this. I had a look at the code and it is the _parse_docstring line that I need to be able to call directly as I use the .first_line and .params it returns. Would it be possible to make _parse_docstring public?

Here is what the current code looks like - it has been through some changes to fix various problems with the solution I had originally posted above:

    parsed_main_doc = defopt._parse_docstring(getdoc(projectmainfn))
    main_doc = [f"{parsed_main_doc.first_line}\n\nArgs:"]
    no_main_params = len(parsed_main_doc.params)
    for param_name, param_info in parsed_main_doc.params.items():
        main_doc.append(
            f"\n    {param_name} ({param_info.type}): {param_info.text}"
        )
    create_config_doc = getdoc(Configuration.create)
    kwargs_index = create_config_doc.index("**kwargs")
    kwargs_index = create_config_doc.index("\n", kwargs_index)
    args_doc = create_config_doc[kwargs_index:]
    main_doc.append(args_doc)
    main_doc = "".join(main_doc)

    main_sig = defopt.signature(projectmainfn)
    param_names = list()
    for param in main_sig.parameters.values():
        param_names.append(str(param))

    parsed_main_doc = defopt._parse_docstring(main_doc)
    main_doc = [f"{parsed_main_doc.first_line}\n\nArgs:"]
    count = 0
    for param_name, param_info in parsed_main_doc.params.items():
        param_type = param_info.type
        if param_type == "dict":
            continue
        if count < no_main_params:
            count += 1
        else:
            if param_type == "str":
                param_type = "Optional[str]"
                default = None
            elif param_type == "bool":
                default = False
            else:
                raise ValueError(
                    "Configuration.create has new parameter with unknown type!"
                )
            param_names.append(f"{param_name}: {param_type} = {default}")
        main_doc.append(
            f"\n    {param_name} ({param_type}): {param_info.text}"
        )
    main_doc = "".join(main_doc)

    projectmainname = projectmainfn.__name__
    main_sig = f"{projectmainname}({','.join(param_names)})"

    argv = sys.argv[1:]
    for key in kwargs:
        name = f"--{key.replace('_', '-')}"
        if name not in argv:
            argv.append(name)
            argv.append(kwargs[key])

    @with_signature(main_sig)
    def gen_func(*args, **kwargs):
        """docstring"""
        logger.info(f"> Using HDX Python API Library {__version__}")

    gen_func.__doc__ = main_doc

    configuration_create = defopt.bind(gen_func, argv=argv, cli_options="all")
    main_func, _ = defopt.bind_known(
        projectmainfn, argv=argv, cli_options="all"
    )
...
    main_func()
anntzer commented 1 year ago

I am a bit reluctant to turn first_line into public API because I think I would actually rather get rid of it even internally and simply replace it by text.split("\n\n")[0], which should be equivalent.

As for accessing params, doesn't signature(...).params provide what you need? You can get e.g. signature(...).params[...].doc and signature(...).params[...].annotation, the only difference being that the latter is a realy python type object rather than a str, but in your use cases this should work just as well? (I didn't actually check by adjusting your entire code to what I propose, to be fair.)

mcarans commented 1 year ago

The issue is that I need to parse each possible kwarg from a docstring that looks like this:

class Configuration(UserDict):
    """Configuration for HDX

    Args:
        **kwargs: See below
        user_agent (str): User agent string. HDXPythonLibrary/X.X.X- is prefixed. Must be supplied if remoteckan is not.
        user_agent_config_yaml (str): Path to YAML user agent configuration. Ignored if user_agent supplied. Defaults to ~/.useragent.yml.
...
    def __init__(self, **kwargs: Any) -> None:
        super().__init__()

From the above I need to parse user_agent and user_agent_config_yaml (and the long list of possible arguments that are allowed as kwargs that follows). What I do is add everything below **kwargs as docstring parameters after my other functions docstring parameters, then use _parse_docstring to parse the resulting combined docstring. This wouldn't work with signature because the **kwargs are treated as a single parameter and signature.doc ignores all the stuff below the **kwargs line.

I'm fine with losing first_line but can't think of a way to avoid using _parse_docstring (which is a really useful function) to parse my constructed docstring to get the parameters.

anntzer commented 1 year ago

OK, I see, this seems reasonable. (Sorry for being difficult, but I try to be conservative with adding new APIs.)

One thing I am not so happy with in _parse_docstring is the custom (_Doc) return type, which I don't really want to document. However I think I could actually get rid of it and just return another defopt.Signature instance (corresponding to whatever is documented). In fact this suggests expanding defopt.signature() to support also taking a docstring (instead of a callable) as argument: when a callable without docstring is passed, it computes the signature from the callable; when a standalone docstring is passed, it computes the signature from the docstring; when a callable with a docstring is passed, it computes both, which have to be compatible.

Thoughts?

mcarans commented 1 year ago

That sounds like a good idea. For each parameter, I need to be able to get the parameter name, text description and type. For the parameter type, I need to be able to to determine if it is a "dict", "str" or "bool" for the time being.

anntzer commented 1 year ago

OK, can you check how HEAD looks to you? Note the API changes noted in the changelog.

mcarans commented 1 year ago

That's great! It's a nice addition to defopt. It looks like it is working. The code now looks like this:

    parsed_main_doc = defopt.signature(getdoc(projectmainfn))
    main_doc = [f"{parsed_main_doc.doc}Args:"]
    params = parsed_main_doc.parameters
    no_main_params = len(params)
    for param_name, param_info in params.items():
        main_doc.append(
            f"\n    {param_name} ({param_info.annotation}): {param_info.doc}"
        )
    create_config_doc = getdoc(Configuration.create)
    kwargs_index = create_config_doc.index("**kwargs")
    kwargs_index = create_config_doc.index("\n", kwargs_index)
    args_doc = create_config_doc[kwargs_index:]
    main_doc.append(args_doc)
    main_doc = "".join(main_doc)

    main_sig = defopt.signature(projectmainfn)
    param_names = list()
    for param in main_sig.parameters.values():
        param_names.append(str(param))

    parsed_main_doc = defopt.signature(main_doc)
    main_doc = [f"{parsed_main_doc.doc}Args:"]
    count = 0
    for param_name, param_info in parsed_main_doc.parameters.items():
        param_type = param_info.annotation
        if param_type == "dict":
            continue
        if count < no_main_params:
            count += 1
        else:
            if param_type == "str":
                param_type = "Optional[str]"
                default = None
            elif param_type == "bool":
                default = False
            else:
                raise ValueError(
                    "Configuration.create has new parameter with unknown type!"
                )
            param_names.append(f"{param_name}: {param_type} = {default}")
        main_doc.append(
            f"\n    {param_name} ({param_type}): {param_info.doc}"
        )
    main_doc = "".join(main_doc)

    projectmainname = projectmainfn.__name__
    main_sig = f"{projectmainname}({','.join(param_names)})"

    argv = sys.argv[1:]
    for key in kwargs:
        name = f"--{key.replace('_', '-')}"
        if name not in argv:
            argv.append(name)
            argv.append(kwargs[key])

    @with_signature(main_sig)
    def gen_func(*args, **kwargs):
        """docstring"""
        site_url = Configuration._create(*args, **kwargs)
        logger.info("--------------------------------------------------")
        logger.info(f"> Using HDX Python API Library {__version__}")
        logger.info(f"> HDX Site: {site_url}")

    gen_func.__doc__ = main_doc

    configuration_create = defopt.bind(gen_func, argv=argv, cli_options="all")
    main_func, _ = defopt.bind_known(
        projectmainfn, argv=argv, cli_options="all"
    )
...
    main_func()
anntzer commented 1 year ago

OK, let's close this then. Thanks for the API suggestions and discussion.