epage / clapng

A full featured, fast Command Line Argument Parser for Rust
docs.rs/clap
Apache License 2.0
0 stars 0 forks source link

Optional option-argument parsing is incorrect #241

Open epage opened 2 years ago

epage commented 2 years ago

Issue by mkayaalp Monday Nov 15, 2021 at 19:57 GMT Originally opened as https://github.com/clap-rs/clap/issues/3030


Please complete the following tasks

Rust Version

rustc 1.56.0 (09c42c458 2021-10-18)

Clap Version

master

Minimal reproducible code

use clap::Parser;
#[derive(Parser, Debug)]
struct Opt {
    #[clap(short,long)]
    foo: Option<Option<String>>,
    input: String,
}

fn main() {
    let opt = Opt::parse();
    println!("{:?}", opt);
}

Steps to reproduce the bug with the above code

cargo run -- -f input

or

cargo run -- --foo input

Actual Behaviour

error: The following required arguments were not provided:
    <INPUT>

USAGE:
    clap-test --foo <FOO> <INPUT>

For more information try --help

Expected Behaviour

Opt { foo: Some(None), input: "input" }

Additional Context

There needs to be some special handling of options with optional arguments.

Optional option-arguments are bad

First of all, users really need to be discouraged (if we ever add some lint-like feature, e.g. #[clap(enable-warnings)).

POSIX says:

Option-arguments should not be optional.

Solaris says:

Option-arguments cannot be optional.

Parsing

I know some people have a preference for -f OptArg over -fOptArg, but if the option-argument is optional, then only the latter is correct. If there is a space, it means the optional argument is not there.

From POSIX.1-2017 - Ch. 12: Utility Conventions:

a conforming application shall place any option-argument for that option directly adjacent to the option in the same argument string, without intervening <blank> characters

If the utility receives an argument containing only the option, it shall behave as specified in its description for an omitted option-argument; it shall not treat the next argument (if any) as the option-argument for that option.

Same thing with the long option, which is a GNU extension.

From GNU C Library Reference Manual - Ch. 25.1.1: Program Argument Syntax Conventions:

To specify an argument for a long option, write --name=value. This syntax enables a long option to accept an argument that is itself optional.

See this SO question about how getopt handles this: getopt does not parse optional arguments to parameters.

In clap terms, these are saying we should only allow

In contrast, clap supports

Debug Output

[            clap::build::app]  App::_do_parse
[            clap::build::app]  App::_build
[            clap::build::app]  App::_propagate:clap-test
[            clap::build::app]  App::_check_help_and_version
[            clap::build::app]  App::_check_help_and_version: Removing generated version
[            clap::build::app]  App::_propagate_global_args:clap-test
[            clap::build::app]  App::_derive_display_order:clap-test
[clap::build::app::debug_asserts]   App::_debug_asserts
[clap::build::arg::debug_asserts]   Arg::_debug_asserts:help
[clap::build::arg::debug_asserts]   Arg::_debug_asserts:foo
[clap::build::arg::debug_asserts]   Arg::_debug_asserts:input
[         clap::parse::parser]  Parser::get_matches_with
[         clap::parse::parser]  Parser::_build
[         clap::parse::parser]  Parser::_verify_positionals
[         clap::parse::parser]  Parser::get_matches_with: Begin parsing 'RawOsStr("-f")' ([45, 102])
[         clap::parse::parser]  Parser::get_matches_with: Positional counter...1
[         clap::parse::parser]  Parser::get_matches_with: Low index multiples...false
[         clap::parse::parser]  Parser::possible_subcommand: arg=RawOsStr("-f")
[         clap::parse::parser]  Parser::get_matches_with: sc=None
[         clap::parse::parser]  Parser::parse_short_arg: short_arg=RawOsStr("f")
[         clap::parse::parser]  Parser::parse_short_arg:iter:f
[         clap::parse::parser]  Parser::parse_short_arg:iter:f: cur_idx:=1
[         clap::parse::parser]  Parser::parse_short_arg:iter:f: Found valid opt or flag
[         clap::parse::parser]  Parser::parse_short_arg:iter:f: val=RawOsStr("") (bytes), val=[] (ascii), short_arg=RawOsStr("f")
[         clap::parse::parser]  Parser::parse_opt; opt=foo, val=None
[         clap::parse::parser]  Parser::parse_opt; opt.settings=ArgFlags(TAKES_VAL)
[         clap::parse::parser]  Parser::parse_opt; Checking for val...
[         clap::parse::parser]  Parser::parse_opt: More arg vals required...
[    clap::parse::arg_matcher]  ArgMatcher::inc_occurrence_of: arg=foo
[            clap::build::app]  App::groups_for_arg: id=foo
[            clap::build::app]  App::groups_for_arg: id=foo
[         clap::parse::parser]  Parser::get_matches_with: After parse_short_arg Opt(foo)
[         clap::parse::parser]  Parser::get_matches_with: Begin parsing 'RawOsStr("input")' ([105, 110, 112, 117, 116])
[         clap::parse::parser]  Parser::get_matches_with: Positional counter...1
[         clap::parse::parser]  Parser::get_matches_with: Low index multiples...false
[         clap::parse::parser]  Parser::add_val_to_arg; arg=foo, val=RawOsStr("input")
[         clap::parse::parser]  Parser::add_val_to_arg; trailing_values=false, DontDelimTrailingVals=false
[         clap::parse::parser]  Parser::add_single_val_to_arg: adding val..."input"
[         clap::parse::parser]  Parser::add_single_val_to_arg: cur_idx:=2
[            clap::build::app]  App::groups_for_arg: id=foo
[    clap::parse::arg_matcher]  ArgMatcher::needs_more_vals: o=foo
[    clap::parse::arg_matcher]  ArgMatcher::needs_more_vals: max_vals...1
[         clap::parse::parser]  Parser::remove_overrides
[         clap::parse::parser]  Parser::remove_overrides:iter:foo
[      clap::parse::validator]  Validator::validate
[         clap::parse::parser]  Parser::add_env: Checking arg `--help`
[         clap::parse::parser]  Parser::add_env: Skipping existing arg `--foo <FOO>`
[         clap::parse::parser]  Parser::add_env: Checking arg `<INPUT>`
[         clap::parse::parser]  Parser::add_defaults
[         clap::parse::parser]  Parser::add_defaults:iter:foo:
[         clap::parse::parser]  Parser::add_value: doesn't have conditional defaults
[         clap::parse::parser]  Parser::add_value:iter:foo: doesn't have default vals
[         clap::parse::parser]  Parser::add_value:iter:foo: doesn't have default missing vals
[         clap::parse::parser]  Parser::add_defaults:iter:input:
[         clap::parse::parser]  Parser::add_value: doesn't have conditional defaults
[         clap::parse::parser]  Parser::add_value:iter:input: doesn't have default vals
[         clap::parse::parser]  Parser::add_value:iter:input: doesn't have default missing vals
[      clap::parse::validator]  Validator::validate_conflicts
[      clap::parse::validator]  Validator::validate_exclusive
[      clap::parse::validator]  Validator::validate_exclusive:iter:foo
[      clap::parse::validator]  Validator::gather_conflicts
[      clap::parse::validator]  Validator::gather_conflicts:iter: id=foo
[            clap::build::app]  App::groups_for_arg: id=foo
[      clap::parse::validator]  Validator::validate_required: required=ChildGraph([Child { id: input, children: [] }])
[      clap::parse::validator]  Validator::gather_requirements
[      clap::parse::validator]  Validator::gather_requirements:iter:foo
[      clap::parse::validator]  Validator::validate_required:iter:aog=input
[      clap::parse::validator]  Validator::validate_required:iter: This is an arg
[      clap::parse::validator]  Validator::is_missing_required_ok: input
[      clap::parse::validator]  Validator::validate_arg_conflicts: a="input"
[      clap::parse::validator]  Validator::missing_required_error; incl=[]
[      clap::parse::validator]  Validator::missing_required_error: reqs=ChildGraph([Child { id: input, children: [] }])
[         clap::output::usage]  Usage::get_required_usage_from: incls=[], matcher=true, incl_last=true
[         clap::output::usage]  Usage::get_required_usage_from: unrolled_reqs={input}
[         clap::output::usage]  Usage::get_required_usage_from:iter:input
[         clap::output::usage]  Usage::get_required_usage_from: ret_val=["<INPUT>"]
[      clap::parse::validator]  Validator::missing_required_error: req_args=[
    "<INPUT>",
]
[         clap::output::usage]  Usage::create_usage_with_title
[         clap::output::usage]  Usage::create_usage_no_title
[         clap::output::usage]  Usage::create_smart_usage
[         clap::output::usage]  Usage::get_required_usage_from: incls=[foo], matcher=false, incl_last=true
[         clap::output::usage]  Usage::get_required_usage_from: unrolled_reqs={input}
[         clap::output::usage]  Usage::get_required_usage_from:iter:foo
[         clap::output::usage]  Usage::get_required_usage_from:iter:input
[         clap::output::usage]  Usage::get_required_usage_from: ret_val=["--foo <FOO>", "<INPUT>"]
[            clap::build::app]  App::color: Color setting...
[            clap::build::app]  Auto
[           clap::output::fmt]  is_a_tty: stderr=true
epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 20:05 GMT


Would the summary be that we for optional values, we should be encouraging requires_delimiter (granted, I've not checked what that does in the short flag case or if that deviates from these standards)?

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 20:36 GMT


That changes the parsing for multiple_values, right? Couldn't test that with clap_derive but, the following gives the same error (I tried to adapt the cargo expand output to use the builder pattern):

  let opt = App::new("clap-test")
    .arg(Arg::new("foo")
        .short('f')
        .long("foo")
        .takes_value(true)
        .value_name("FOO")
        .min_values(0)
        .max_values(1)
        .multiple_values(false)
        .use_delimiter(true)
        .require_delimiter(true))
    .arg(Arg::new("input")
        .takes_value(true)
        .value_name("INPUT")
        .required(true)).get_matches();

The POSIX guideline says:

When multiple option-arguments are specified to follow a single option, they should be presented as a single argument, using <comma> characters within that argument or <blank> characters within that argument to separate them.

So, -O a,b,c or -O "a b c"

Solaris guideline says:

Groups of option-arguments following an option must either be separated by commas or separated by tab or space character and quoted (-o xxx,z,yy or -o"xxx z yy").

I don't know if the lack of space between option and quote in -o"xxx z yy" is a typo, but if the option-arguments were optional, then that would be correct (instead of -o "xxx z yy").

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 20:48 GMT


This could be a separate issue, but the synopsis for optional option-argument should be:

USAGE:
    clap-test [--foo[=FOO]] <INPUT>

or

USAGE:
    clap-test [-f[FOO]] <INPUT>
epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 20:50 GMT


Sorry, I mean to recommend requires_equals. requires_delimiter is for the other case you mentioned.

epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 20:50 GMT


(too many generic names,. keep referring to the wrong one. #3026 talks about cleaning this up)

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 20:58 GMT


Sorry, I mean to recommend requires_equals

Yes. That works for long, but it also requires it for short: -f works (no arg) but -fval does not work anymore. It requires -f=val.

epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 21:07 GMT


Yeah, I was worried it didn't do the right thing in that case.

epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 21:11 GMT


I tried to summarize this at the bottom of your "Parsing" section. Mind double checking it?

epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 21:22 GMT


Random observations:

1 atm the color flag in my CLIs is --color=<WHEN> when ideally it would be --color[=<WHEN>]. One of the things that has held me back from this is I've been unsure how users would react to the inconsistency of some args requiring = while others do not, depending on whether they have flag/option duality or just an option. For the user,. when thinking about it as an option, they are likely to think of it like any other option.

2 For the most part, users are constructing the optional-optional value, we are just providing the building blocks. In following that pattern, we'd need to be able to allow users to specify the policy in a way that isn't confusing. So how do we communicate this? One thought is to have an enum listing out delimited (=), arg (`), or concatenate (-svalue) and the user can provide a set of these to a long and short function. This requires the user to build it up how they want. We'd want to document what the recommendations would be. We could haveclap_derivedo "the right thing" forOption<Option>`.

My main concern about this is the overall complexity, not just in code but in API design and user code. Is there a simpler way?

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 21:26 GMT


I tried to summarize this at the bottom of your "Parsing" section. Mind double checking it?

That's correct. Although I don't know if I would say clap supports -s and -s value when it interprets the operand (positional argument) to be the option-argument and gives an error.

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 21:43 GMT


atm the color flag in my CLIs is --color=<WHEN> when ideally it would be --color[=<WHEN>]

I would say --color[=WHEN].

For the user, when thinking about it as an option, they are likely to think of it like any other option.

But --name=value IS the correct form. Re-quoting GNU:

To specify an argument for a long option, write --name=value.

Allowing a space is the parser being lenient.

Same thing with -s value being the norm for short options and -svalue being allowed for leniency. Except for optional option-arguments. Then, it's the other way around. Solaris did have a good reason to outlaw those, I think.

epage commented 2 years ago

Comment by epage Monday Nov 15, 2021 at 21:51 GMT


Allowing a space is the parser being lenient.

While GNU is saying one thing, we still have to deal with user expectations. As a user, I never use = and I expect many others do the same. It doesn't matter what standard I reference, the user will still be frustrated at the inconsistency.

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 21:52 GMT


we'd need to be able to allow users to specify the policy in a way that isn't confusing.

We could say "short" option follows the POSIX guidelines, and "long" option follows the GNU guidelines. Then have a custom option escape hatch that changes the prefix, separator, delimiter, with allow/disallow/require flags for each.

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 22:00 GMT


we still have to deal with user expectations.

I agree, I am not saying we make the parser strict (although, it would be a really good addition to AppSettings).

As a user, I am expecting --foo input to parse it as option foo without argument and input as the operand. And I would also expect --foo optarg input to parse as option foo with optarg as the argument and input as the operand (which could very well fail, as it is easily ambiguous: if input was optional or variadic).

But I would prefer to show [--foo[=OPTARG]] in the usage as the canonical form.

epage commented 2 years ago

Comment by mkayaalp Monday Nov 15, 2021 at 22:28 GMT


For the most part, users are constructing the optional-optional value, we are just providing the building blocks.

Well, we do have a short/long/operand distinction. If we had simpler building blocks, short/long options could be "non-positional optional operands with labels" and short and long could only differ in their label length and prefix (-/--) and alias each other. We would need to have a combining/stacking feature for short options to use.

Going the other way, maybe we need another distinction for optional option-arguments: short_optional_arg and long_optional_arg. We could make sure the type was suitable for clap_derive: Option<Option<T>> or Option<Vec<T>>.