clap-rs / clap

A full featured, fast Command Line Argument Parser for Rust
docs.rs/clap
Apache License 2.0
13.72k stars 1.02k forks source link

Optional option-argument parsing is incorrect #3030

Open mkayaalp opened 2 years ago

mkayaalp commented 2 years ago

Please complete the following tasks

Rust Version

rustc 1.56.0 (09c42c458 2021-10-18)

Clap Version

master

Minimal reproducible code

use clap::Parser;
#[derive(Parser, Debug)]
struct Opt {
    #[clap(short,long)]
    foo: Option<Option<String>>,
    input: String,
}

fn main() {
    let opt = Opt::parse();
    println!("{:?}", opt);
}

Steps to reproduce the bug with the above code

cargo run -- -f input

or

cargo run -- --foo input

Actual Behaviour

error: The following required arguments were not provided:
    <INPUT>

USAGE:
    clap-test --foo <FOO> <INPUT>

For more information try --help

Expected Behaviour

Opt { foo: Some(None), input: "input" }

Additional Context

There needs to be some special handling of options with optional arguments.

Optional option-arguments are bad

First of all, users really need to be discouraged (if we ever add some lint-like feature, e.g. #[clap(enable-warnings)).

POSIX says:

Option-arguments should not be optional.

Solaris says:

Option-arguments cannot be optional.

Parsing

I know some people have a preference for -f OptArg over -fOptArg, but if the option-argument is optional, then only the latter is correct. If there is a space, it means the optional argument is not there.

From POSIX.1-2017 - Ch. 12: Utility Conventions:

a conforming application shall place any option-argument for that option directly adjacent to the option in the same argument string, without intervening <blank> characters

If the utility receives an argument containing only the option, it shall behave as specified in its description for an omitted option-argument; it shall not treat the next argument (if any) as the option-argument for that option.

Same thing with the long option, which is a GNU extension.

From GNU C Library Reference Manual - Ch. 25.1.1: Program Argument Syntax Conventions:

To specify an argument for a long option, write --name=value. This syntax enables a long option to accept an argument that is itself optional.

See this SO question about how getopt handles this: getopt does not parse optional arguments to parameters.

In clap terms, these are saying we should only allow

In contrast, clap supports

Debug Output

[            clap::build::app]  App::_do_parse
[            clap::build::app]  App::_build
[            clap::build::app]  App::_propagate:clap-test
[            clap::build::app]  App::_check_help_and_version
[            clap::build::app]  App::_check_help_and_version: Removing generated version
[            clap::build::app]  App::_propagate_global_args:clap-test
[            clap::build::app]  App::_derive_display_order:clap-test
[clap::build::app::debug_asserts]   App::_debug_asserts
[clap::build::arg::debug_asserts]   Arg::_debug_asserts:help
[clap::build::arg::debug_asserts]   Arg::_debug_asserts:foo
[clap::build::arg::debug_asserts]   Arg::_debug_asserts:input
[         clap::parse::parser]  Parser::get_matches_with
[         clap::parse::parser]  Parser::_build
[         clap::parse::parser]  Parser::_verify_positionals
[         clap::parse::parser]  Parser::get_matches_with: Begin parsing 'RawOsStr("-f")' ([45, 102])
[         clap::parse::parser]  Parser::get_matches_with: Positional counter...1
[         clap::parse::parser]  Parser::get_matches_with: Low index multiples...false
[         clap::parse::parser]  Parser::possible_subcommand: arg=RawOsStr("-f")
[         clap::parse::parser]  Parser::get_matches_with: sc=None
[         clap::parse::parser]  Parser::parse_short_arg: short_arg=RawOsStr("f")
[         clap::parse::parser]  Parser::parse_short_arg:iter:f
[         clap::parse::parser]  Parser::parse_short_arg:iter:f: cur_idx:=1
[         clap::parse::parser]  Parser::parse_short_arg:iter:f: Found valid opt or flag
[         clap::parse::parser]  Parser::parse_short_arg:iter:f: val=RawOsStr("") (bytes), val=[] (ascii), short_arg=RawOsStr("f")
[         clap::parse::parser]  Parser::parse_opt; opt=foo, val=None
[         clap::parse::parser]  Parser::parse_opt; opt.settings=ArgFlags(TAKES_VAL)
[         clap::parse::parser]  Parser::parse_opt; Checking for val...
[         clap::parse::parser]  Parser::parse_opt: More arg vals required...
[    clap::parse::arg_matcher]  ArgMatcher::inc_occurrence_of: arg=foo
[            clap::build::app]  App::groups_for_arg: id=foo
[            clap::build::app]  App::groups_for_arg: id=foo
[         clap::parse::parser]  Parser::get_matches_with: After parse_short_arg Opt(foo)
[         clap::parse::parser]  Parser::get_matches_with: Begin parsing 'RawOsStr("input")' ([105, 110, 112, 117, 116])
[         clap::parse::parser]  Parser::get_matches_with: Positional counter...1
[         clap::parse::parser]  Parser::get_matches_with: Low index multiples...false
[         clap::parse::parser]  Parser::add_val_to_arg; arg=foo, val=RawOsStr("input")
[         clap::parse::parser]  Parser::add_val_to_arg; trailing_values=false, DontDelimTrailingVals=false
[         clap::parse::parser]  Parser::add_single_val_to_arg: adding val..."input"
[         clap::parse::parser]  Parser::add_single_val_to_arg: cur_idx:=2
[            clap::build::app]  App::groups_for_arg: id=foo
[    clap::parse::arg_matcher]  ArgMatcher::needs_more_vals: o=foo
[    clap::parse::arg_matcher]  ArgMatcher::needs_more_vals: max_vals...1
[         clap::parse::parser]  Parser::remove_overrides
[         clap::parse::parser]  Parser::remove_overrides:iter:foo
[      clap::parse::validator]  Validator::validate
[         clap::parse::parser]  Parser::add_env: Checking arg `--help`
[         clap::parse::parser]  Parser::add_env: Skipping existing arg `--foo <FOO>`
[         clap::parse::parser]  Parser::add_env: Checking arg `<INPUT>`
[         clap::parse::parser]  Parser::add_defaults
[         clap::parse::parser]  Parser::add_defaults:iter:foo:
[         clap::parse::parser]  Parser::add_value: doesn't have conditional defaults
[         clap::parse::parser]  Parser::add_value:iter:foo: doesn't have default vals
[         clap::parse::parser]  Parser::add_value:iter:foo: doesn't have default missing vals
[         clap::parse::parser]  Parser::add_defaults:iter:input:
[         clap::parse::parser]  Parser::add_value: doesn't have conditional defaults
[         clap::parse::parser]  Parser::add_value:iter:input: doesn't have default vals
[         clap::parse::parser]  Parser::add_value:iter:input: doesn't have default missing vals
[      clap::parse::validator]  Validator::validate_conflicts
[      clap::parse::validator]  Validator::validate_exclusive
[      clap::parse::validator]  Validator::validate_exclusive:iter:foo
[      clap::parse::validator]  Validator::gather_conflicts
[      clap::parse::validator]  Validator::gather_conflicts:iter: id=foo
[            clap::build::app]  App::groups_for_arg: id=foo
[      clap::parse::validator]  Validator::validate_required: required=ChildGraph([Child { id: input, children: [] }])
[      clap::parse::validator]  Validator::gather_requirements
[      clap::parse::validator]  Validator::gather_requirements:iter:foo
[      clap::parse::validator]  Validator::validate_required:iter:aog=input
[      clap::parse::validator]  Validator::validate_required:iter: This is an arg
[      clap::parse::validator]  Validator::is_missing_required_ok: input
[      clap::parse::validator]  Validator::validate_arg_conflicts: a="input"
[      clap::parse::validator]  Validator::missing_required_error; incl=[]
[      clap::parse::validator]  Validator::missing_required_error: reqs=ChildGraph([Child { id: input, children: [] }])
[         clap::output::usage]  Usage::get_required_usage_from: incls=[], matcher=true, incl_last=true
[         clap::output::usage]  Usage::get_required_usage_from: unrolled_reqs={input}
[         clap::output::usage]  Usage::get_required_usage_from:iter:input
[         clap::output::usage]  Usage::get_required_usage_from: ret_val=["<INPUT>"]
[      clap::parse::validator]  Validator::missing_required_error: req_args=[
    "<INPUT>",
]
[         clap::output::usage]  Usage::create_usage_with_title
[         clap::output::usage]  Usage::create_usage_no_title
[         clap::output::usage]  Usage::create_smart_usage
[         clap::output::usage]  Usage::get_required_usage_from: incls=[foo], matcher=false, incl_last=true
[         clap::output::usage]  Usage::get_required_usage_from: unrolled_reqs={input}
[         clap::output::usage]  Usage::get_required_usage_from:iter:foo
[         clap::output::usage]  Usage::get_required_usage_from:iter:input
[         clap::output::usage]  Usage::get_required_usage_from: ret_val=["--foo <FOO>", "<INPUT>"]
[            clap::build::app]  App::color: Color setting...
[            clap::build::app]  Auto
[           clap::output::fmt]  is_a_tty: stderr=true
epage commented 2 years ago

Would the summary be that we for optional values, we should be encouraging requires_delimiter (granted, I've not checked what that does in the short flag case or if that deviates from these standards)?

mkayaalp commented 2 years ago

That changes the parsing for multiple_values, right? Couldn't test that with clap_derive but, the following gives the same error (I tried to adapt the cargo expand output to use the builder pattern):

  let opt = App::new("clap-test")
    .arg(Arg::new("foo")
        .short('f')
        .long("foo")
        .takes_value(true)
        .value_name("FOO")
        .min_values(0)
        .max_values(1)
        .multiple_values(false)
        .use_delimiter(true)
        .require_delimiter(true))
    .arg(Arg::new("input")
        .takes_value(true)
        .value_name("INPUT")
        .required(true)).get_matches();

The POSIX guideline says:

When multiple option-arguments are specified to follow a single option, they should be presented as a single argument, using <comma> characters within that argument or <blank> characters within that argument to separate them.

So, -O a,b,c or -O "a b c"

Solaris guideline says:

Groups of option-arguments following an option must either be separated by commas or separated by tab or space character and quoted (-o xxx,z,yy or -o"xxx z yy").

I don't know if the lack of space between option and quote in -o"xxx z yy" is a typo, but if the option-arguments were optional, then that would be correct (instead of -o "xxx z yy").

mkayaalp commented 2 years ago

This could be a separate issue, but the synopsis for optional option-argument should be:

USAGE:
    clap-test [--foo[=FOO]] <INPUT>

or

USAGE:
    clap-test [-f[FOO]] <INPUT>
epage commented 2 years ago

Sorry, I mean to recommend requires_equals. requires_delimiter is for the other case you mentioned.

epage commented 2 years ago

(too many generic names,. keep referring to the wrong one. #3026 talks about cleaning this up)

mkayaalp commented 2 years ago

Sorry, I mean to recommend requires_equals

Yes. That works for long, but it also requires it for short: -f works (no arg) but -fval does not work anymore. It requires -f=val.

epage commented 2 years ago

Yeah, I was worried it didn't do the right thing in that case.

epage commented 2 years ago

I tried to summarize this at the bottom of your "Parsing" section. Mind double checking it?

epage commented 2 years ago

Random observations:

1 atm the color flag in my CLIs is --color=<WHEN> when ideally it would be --color[=<WHEN>]. One of the things that has held me back from this is I've been unsure how users would react to the inconsistency of some args requiring = while others do not, depending on whether they have flag/option duality or just an option. For the user,. when thinking about it as an option, they are likely to think of it like any other option.

2 For the most part, users are constructing the optional-optional value, we are just providing the building blocks. In following that pattern, we'd need to be able to allow users to specify the policy in a way that isn't confusing. So how do we communicate this? One thought is to have an enum listing out delimited (=), arg (`), or concatenate (-svalue) and the user can provide a set of these to a long and short function. This requires the user to build it up how they want. We'd want to document what the recommendations would be. We could haveclap_derivedo "the right thing" forOption<Option>`.

My main concern about this is the overall complexity, not just in code but in API design and user code. Is there a simpler way?

mkayaalp commented 2 years ago

I tried to summarize this at the bottom of your "Parsing" section. Mind double checking it?

That's correct. Although I don't know if I would say clap supports -s and -s value when it interprets the operand (positional argument) to be the option-argument and gives an error.

mkayaalp commented 2 years ago

atm the color flag in my CLIs is --color=<WHEN> when ideally it would be --color[=<WHEN>]

I would say --color[=WHEN].

For the user, when thinking about it as an option, they are likely to think of it like any other option.

But --name=value IS the correct form. Re-quoting GNU:

To specify an argument for a long option, write --name=value.

Allowing a space is the parser being lenient.

Same thing with -s value being the norm for short options and -svalue being allowed for leniency. Except for optional option-arguments. Then, it's the other way around. Solaris did have a good reason to outlaw those, I think.

epage commented 2 years ago

Allowing a space is the parser being lenient.

While GNU is saying one thing, we still have to deal with user expectations. As a user, I never use = and I expect many others do the same. It doesn't matter what standard I reference, the user will still be frustrated at the inconsistency.

mkayaalp commented 2 years ago

we'd need to be able to allow users to specify the policy in a way that isn't confusing.

We could say "short" option follows the POSIX guidelines, and "long" option follows the GNU guidelines. Then have a custom option escape hatch that changes the prefix, separator, delimiter, with allow/disallow/require flags for each.

mkayaalp commented 2 years ago

we still have to deal with user expectations.

I agree, I am not saying we make the parser strict (although, it would be a really good addition to AppSettings).

As a user, I am expecting --foo input to parse it as option foo without argument and input as the operand. And I would also expect --foo optarg input to parse as option foo with optarg as the argument and input as the operand (which could very well fail, as it is easily ambiguous: if input was optional or variadic).

But I would prefer to show [--foo[=OPTARG]] in the usage as the canonical form.

mkayaalp commented 2 years ago

For the most part, users are constructing the optional-optional value, we are just providing the building blocks.

Well, we do have a short/long/operand distinction. If we had simpler building blocks, short/long options could be "non-positional optional operands with labels" and short and long could only differ in their label length and prefix (-/--) and alias each other. We would need to have a combining/stacking feature for short options to use.

Going the other way, maybe we need another distinction for optional option-arguments: short_optional_arg and long_optional_arg. We could make sure the type was suitable for clap_derive: Option<Option<T>> or Option<Vec<T>>.

mina86 commented 1 year ago

While GNU is saying one thing, we still have to deal with user expectations.

Many users are used to GNU and Unix-like systems their expectation is that flags will behave the way GNU and POSIX describes. Users aren’t dumb. They can understand CLI conventions so long as they are somewhat consistent between applications. It’s when every application has their own idea of how arguments should be parsed is when we get into trouble. (And I’m sorry to say but clap is contributing to that problem).

epage commented 1 year ago

(And I’m sorry to say but clap is contributing to that problem).

Could you clarify which problem clap is contributing to? If its allowing a space between a flag and a value, the cat is out of the bag on that one and its not just clap. If its for something else, please clarify,

mina86 commented 1 year ago

clap is definitely not the only offender, but it does contribute to the problem.

The reason I have irrational hatred of clap are:

And yes, I’m aware cat is out of the bag and I’ll have to continue dealing with my irrational hatred but perhaps some of clap parsing may be made saner.

epage commented 1 year ago

perhaps some of clap parsing may be made saner.

I regret to inform you, it likely isn't. Looking at the high level options for making the ecosystem "saner":

For allowing individual apps to be "saner", we'll need a strong case for why or where being more permissive is a problem due to the aforementioned challenges with configurability. Design work on this would also need to come from the wider community as this is not one of our stated focus areas. I'm most sympathetic to cases like uutils where they are trying to match behavior but even then I would need to understand why being more lax is a problem so long as all of the strict cases work.

my4ng commented 2 months ago

I have proposed a solution in PR #5489 where require_equals is limited to long options and a new setting require_no_space only allows the -oval format for short options. With both set, this should be compliant to the POSIX/GNU conventions for optional option-argument, for example:

Usage: myapp [OPTIONS]

Options:
  -f[<foo>], --foo[=<foo>]  This is foo
  -h, --help                Print help