blyxxyz / lexopt

Minimalist pedantic command line parser
MIT License
294 stars 9 forks source link

Please provide way to parse `-o=value` as option taking value `=value` #13

Open ijackson opened 2 years ago

ijackson commented 2 years ago

Hi. I saw this crate mentioned in a blog post and I like the idea. But there is a difficulty:

Argument parsers should be transparent as much as possible. Currently, becuase lexopt supports -o=value to mean "set to value", to unparse an unknown string (like a user-provided filename) it is necessary to always pass the =.

(The situation with short options is different to long options: supporting --long=value is completely unambiguous and simply a good idea.)

IMO the = is unnatural here. I'm not aware of many other programs which treat -o=value as setting the option to value. Almost all (including for example all the POSIX utilities) treat it as setting the option to =value. See eg the Utility Convension in the Single Unix Specification, or the manpage getopt(3)

And as I point out, handling = specially for short option values is not a cost-free feature (unlike with long options): it changes the meaning of existing programs. A shell script which takes some arguments and runs a lexopt-using Rust program, and passes filenames to it, must write the = or risk malfunctioning on filenames which start with=. Because the = is unconventional with a short option, the author of such a script probably won't have written it, so the script will probably have this bug.

And, within an existing Rust project, switching from another option parsing library to lexopt is hazardous because it will change the meaning of command lines which are accepted by both.

Could you please provide an option to allow lexopt to be used without this = on short option feature? I'm not sure if that would involve a ParserBuilder or whether you could just make it a configuration method on Parser.

Personally I think the default ought to be to not handle = specially in short options but that would be a breaking change.

Thanks for your attention.

blyxxyz commented 2 years ago

Hi,

Thank you for bringing this up! I've spent some time thinking about this issue but I don't think I've done a proper writeup.

I used to have the same opinion as you, which is why lexopt 0.1.0 didn't support -o=. It felt alien to me, and risky for exactly the reason you describe. I changed my mind due to #8.

clap does support this syntax (as does Python's argparse). Most Rust projects switching from another library will be switching from clap, so supporting it seems like the right default. (This was a concern for newsboat in particular.)

Furthermore, moving from a library that doesn't support the syntax to one that does is a small compatibility break (only weird values are affected), while going in the other direction is a big one (normal command lines are broken).

(This does create a ratchet where it's easy to switch to a parser that supports it and hard to switch away. That's a downside of the robustness principle.)

Argument parsers should be transparent as much as possible. Currently, becuase lexopt supports -o=value to mean "set to value", to unparse an unknown string (like a user-provided filename) it is necessary to always pass the =.

Note that even without that syntax the safest solution is to pass -o "$value". That works even if $value is empty, while -o"$value" misbehaves in that case.

Could you please provide an option to allow lexopt to be used without this = on short option feature?

I don't like doubling the number of potential cases, but maybe making it configurable is for the best.

A workaround for now is to use version 0.1.0.

I'm not sure if that would involve a ParserBuilder or whether you could just make it a configuration method on Parser.

A method on Parser should work for this, but that way it can be reconfigured in the middle of parsing, which could be a problem for any future configuration. I'll have to think about it.

Mango0x45 commented 1 year ago

I would also really like to have a way to have -X=ABC be parsed as giving -X the value =ABC. Even if that kind of format is supported by other rust/python libraries, the fact of the matter is that it's different from the behavior of about 99% of command-line utility found on Linux/BSD systems, so all you're really doing is potentially confusing users of programs making use of this library.

blyxxyz commented 1 year ago

I'm waffling about whether to implement this. Are there known cases of this behavior causing problems for clap or argparse users? The closest I can find is fish-shell/fish-shell#8466 but that's not very strong.

Mango0x45 commented 1 year ago

Any possibility of this becoming a thing? It feels really weird to not follow the standard getopt() behavior here…

blyxxyz commented 1 year ago

It does feel weird but feeling weird isn't enough. To justify it I need concrete problem cases, real-world software that didn't work right because of this syntax.

The syntax is supported by the most popular argument parsing libraries for Python (argparse) and for Rust (clap). Do you know if they have had feature requests for this, or if there are Python or Rust programs this has been problematic for?

blyxxyz commented 7 months ago

I found a concrete case! uutils has a flaky workaround for this problem in its implementation of cut (see https://github.com/uutils/coreutils/issues/2424).

echo foo=bar | cut -d= -f2

This is a reasonable command that's used a lot, it's even in my shell history. And uutils is planning to move from clap to something lexopt-based. So it would be nice to support it.

For a single case some crazy try_raw_args workaround might be enough, and I'm unsure about the best API for this, but I'm going to give it more thought.

Mango0x45 commented 7 months ago

I have recently also started using similar behavior with the column command to align assignments in vim:

Before:

int x = 5;
double y = 6;

After column -t -s= -o=:

int x    = 5;
double y = 6;
ijackson commented 7 months ago

It does feel weird but feeling weird isn't enough.

I provided a link to a formal specification.

The correct prior art to refer to is getopt_long, which established the relevant conventions decades ago. And of course, the standard behaviour of all the existing Unix utilities. (clap and Python are, sadly, not examples of best practice.)

blyxxyz commented 7 months ago

I'm aware of the prior art. lexopt has other deviations. POSIX wants you to put all options before all positional arguments. GNU lets you abbreviate long options. lexopt does not have those behaviors, unless you impose them manually.

(Existing Unix utilities are honestly a mixed bag, dd and find and ps and tar deviate heavily and many other utilities deviate more subtly. getopt was a late invention. The argument parsing code in V7 Unix is a mess, different for every utility.)

I don't want to comply with conventions just because they're there. I want to comply with conventions because it's helpful, because we're worse off otherwise.

I introduced this deviation in 0.2.0 because I knew for a fact that some people would be better off that way. So I asked to convince me that some people are worse off because of it. Like by finding someone on the Python issue tracker whose script didn't work because of this syntax. I know about the theoretical objections, so I looked for cases like that, but I didn't find them, so I wasn't convinced we were worse off.

I kept looking for over a year and yesterday I finally found a case in which this behavior made someone worse off. That's what I was looking for.

Rapptz commented 1 week ago

I'm just a passer-by here who was shopping for a command line parsing library.

Isn't the example you found basically a chicken and egg problem? The only reason why the problem manifested is precisely because it's a port of a program using getopt rather than anything else. It would have been possible to find similar ports where someone would have run into this. I guess fundamentally the question should have been whether you want to follow the getopt GNU style or not.

I have no real stake in this, but I don't feel like the example shown is particularly strong. The workaround given isn't very egregious either, since it doesn't take that much code to actually resolve and was niche in nature.

I've found that most modern tools tend to use more lenient parsing reminiscent to the Fuschia conventions rather than the getopt conventions. Go, for example, which is super popular for making simple CLI tools, doesn't even use -- for its long forms. Instead they use a single -. No getopt or argparse or Fuschia compatibility there, yet not entirely unusual in the modern world.

Anyway I'm sorry for barging in here, I just wanted to share my thoughts on this. It doesn't particularly impact me either way. If forced to give an opinion then I think maintaining compatibility with the rest of the ecosystem (e.g. clap, argh, xflags, etc.) is probably more important than getopt compatibility. You could make the argument that deviating from that makes the crate stand out for those who want that, though I think there are a lot of getopt compatible crates on crates.io. Unsure if they're working though.

Mango0x45 commented 1 week ago

Isn't the example you found basically a chicken and egg problem? The only reason why the problem manifested is precisely because it's a port of a program using getopt rather than anything else. It would have been possible to find similar ports where someone would have run into this. I guess fundamentally the question should have been whether you want to follow the getopt GNU style or not.

I mean if you want to think of it that way then sure... but the entire convention of how the world does argument parsing is exactly about getopt() and how getopt() parses arguments. You can be like Go and deviate from it because you feel a different way is better, but as a result Go CLIs are also a pain in the ass because I keep passing my arguments wrong :)

I've found that most modern tools tend to use more lenient parsing reminiscent to the Fuschia conventions rather than the getopt conventions.

Most modern tools that violate the existing conventions also end up with various issues being created by people asking them to follow conventions.

Rapptz commented 1 week ago

There aren't that many violations when following the Fuschia style conventions. The only ones I've noticed with other libraries vs pure getopt is the parsing of -o= or combinations of short options such as e.g. -abc.

Truthfully if you want to support both it's not impossible to do so, I ended up using pico-args and it allows you to configure how parsing is handled for specific cases using feature flags so that might be an option.

blyxxyz commented 1 week ago

I guess fundamentally the question should have been whether you want to follow the getopt GNU style or not.

I do, since it's the most likely to be compatible with other systems (e.g. shell completions) and with people's expectations.

I have no real stake in this, but I don't feel like the example shown is particularly strong. The workaround given isn't very egregious either, since it doesn't take that much code to actually resolve and was niche in nature.

uutils's current (clap) workaround is to check if the command line contains -d= but that's not entirely correct since it e.g. doesn't catch -sd=. I think a full workaround with lexopt's current API would require some very bizarre code, though it might still be more doable than with clap.

But it's true that it's niche.

I've found that most modern tools tend to use more lenient parsing reminiscent to the Fuschia conventions rather than the getopt conventions.

Hm, do you have examples of tools that do this? I don't run into this much but if I have a blind spot I'd like to know. (In particular I often use the --option=value syntax and that tends to be supported even though Fuchsia bans it. I don't combine short options as often.)

Fuchsia is (AFAIK) a subset of the GNU getopt syntax so I don't think lenient is the right word?

Anyway I'm sorry for barging in here, I just wanted to share my thoughts on this. It doesn't particularly impact me either way. If forced to give an opinion then I think maintaining compatibility with the rest of the ecosystem (e.g. clap, argh, xflags, etc.) is probably more important than getopt compatibility. You could make the argument that deviating from that makes the crate stand out for those who want that, though I think there are a lot of getopt compatible crates on crates.io. Unsure if they're working though.

The current clap-ish syntax is probably the best place to be since it's a superset of all the most common syntax. Ideally people shouldn't have to switch crates just to get a slightly different syntax though.

it allows you to configure how parsing is handled for specific cases using feature flags so that might be an option.

That way of configuration breaks with the idea that feature flags should be additive, i.e. enabling a feature flag should never break anything. It's maybe not as big a deal for an argument parsing library since that isn't likely to be a transitive dependency, but it can still happen and it doesn't fit the boring pedantic flavor of lexopt. (pico-args is sloppy in other ways as well, which is a valid choice since it keeps the API super easy.)

Thank you for your thoughts!