akamensky / argparse

Argparse for golang. Just because `flag` sucks
MIT License
611 stars 62 forks source link

Initial implementation of positional arguments #102

Closed vsachs closed 2 years ago

vsachs commented 2 years ago

This PR concerns #20 (Positional Arguments issue)

This PR is solely to provide a prototype for a grounded discussion on implementation of positional arguments.

@akamensky I am eager to hear your thoughts on this methodology for positional arguments. Let me know what you think!

akamensky commented 2 years ago

@vsachs That's great, I appreciate you taking a stab at this issue. Could you please provide a few examples of how this would work on CLI and when would it fail (error)?

Overall -- the change seems minimum invasive, so as long as the existing logic does not change (which it appears did not), and no big caveats with the logic I think it should be fine (considering we add some examples and test coverage for new code).

One question though -- do you think with this approach it would be possible to make positional arguments not required? I imagine there may be use cases where positional arguments can be optional. Also there may be use cases where there may be a need for variable number of positional arguments (i.e. list of files to operate on, like cat file1 file2 file3 ... fileN), do you think it would be possible to handle these cases with this logic? Not that it is strictly required, but it may allow for greater flexibility.

jkugler commented 2 years ago

Also, while we're at it, would this be able to do sub-commands? That's basically positional arguments where further arguments are only required when the first positional argument is X vs. Y.

akamensky commented 2 years ago

@jkugler i maybe misunderstand your interpretation of sub commands but that already exists and working. Please check documentation and examples.

vsachs commented 2 years ago

Since you think it's a good approach I'll move forward with this:

I believe it would be possible to extend this approach to support both:

For now I'd like to complete this initial feature then we can evaluate whether to submit as-is or if we want to augment with additional functionality. Will try to be snappy about it so it doesn't get lost.

jkugler commented 2 years ago

@akamensky I'm sorry, you are right. I spoke too soon. :)

akamensky commented 2 years ago

@vsachs Please rebase the code onto master, so you get most up-to-date changes. There were some PRs merged yesterday fixing bugs related to sub-commands

akamensky commented 2 years ago

@vsachs Sorry for the delay. I've had some extra time over the weekend to look at this PR. As I said earlier -- it is fine as-is, I just think it potentially will be beneficial to separate positionals into their own methods rather than use Options.

If use Options it will tie positionals to be always created through standard methods (due to backward compatibility promise within v1), and if for some reason later it would need rewrite or some changes incompatible with current "standard" args methods.

The workaround for that, I think, would be to create a set of dedicated methods to add positional arguments. The benefits of this approach are:

  1. Decoupled from "standard" arguments -- easier to rewrite/rework at later times
  2. Won't need to use short/full "name" in methods (as those are irrelevant for positional args)

It can still use functionality you added by proxy, but changing Options.Positional into private field Options.positional and setting it in proxy method. Such as:

// note `short` and `long` not needed for positionals as they have no names on CLI
func (o *Command) StringPositional(opts *Options) *string {
    opts.positional = true
    ... // something else
    long := ... // generate some random string to ensure it does not overlap with any other argument long name
    return o.String("", long, opts)
}

What do you think about this approach?

vsachs commented 2 years ago

@vsachs Sorry for the delay. I've had some extra time over the weekend to look at this PR. As I said earlier -- it is fine as-is, I just think it potentially will be beneficial to separate positionals into their own methods rather than use Options.

Should be a relatively mild refactor. Will get on that.

akamensky commented 2 years ago

@vsachs something with the implementation that causes interesting failure, i've added a TestPos8 test to show it, within this test you can also move around "-s", "some string" to get different error results.

Also I left a few comments with some minor things in current code.

vsachs commented 2 years ago

@vsachs something with the implementation that causes interesting failure, i've added a TestPos8 test to show it, within this test you can also move around "-s", "some string" to get different error results.

@akamensky I've poked at this and found two items, one I knew about and the other makes sense but I hadn't thought through:

First Issue: The one I hadn't thought about.

For the test you added, the result is not what you expected because the precedence of the commands controls which positional parses which value. As coded now the commands are parsed in LIFO order (deepest command first).

    testArgs1 := []string{"pos", "cmd1", "cmd2", "progPos", "cmd1pos1", "-s", "some string", "cmd1pos2", "cmd2pos1"}
    add order: parser, cmd1, cmd2, cmd2pos1, progPos, cmd1pos1, strval, cmd1pos2

consequently the parse goes like this:

    testArgs1 := []string{"pos", "cmd1", "cmd2", "progPos", "cmd1pos1", "-s", "some string", "cmd1pos2", "cmd2pos1"}
                                                                                        cmd2pos1, cmd1pos1,       strval,                      cmd1pos2, progPos

Solutions

Changing this would probably require a more substantial alteration to the parsing logic. With the current code this is "behaves as expected" although it may be fairly unintuitive to devs and end users. Options:

  1. Leave it as is
  2. Prevent this from happening in the wild by requiring that positionals be added to only one command in any command heirarchy (perhaps the leaf command or just any one command).
  3. We could gracefully handle added positionals in non-leaf-nodes by shifting them down the tree if/when more commands are added, so that they are always parsed at the leaf node. Or some other variant of this approach.
  4. Alter logic to match "expectation" by causing positionals to be parsed "out of band" immediately after a command is recognized. So "parser" consumes progPos, cmd1 consumes cmd1Pos and cmd1Pos, cmd2 consumes cmd2pos1
  5. Alter logic to match "expectation" by causing command argument parsing to account for the total number of positionals registered and moving up to the correct index. This is fairly painful IMO.
  6. Something else?

Second issue: Moving "-s strval" around

This can also be thought about as "altering the order in which positionals versus flags are added to the command". If we alter the add order as below:

    testArgs1 := []string{"pos", "cmd1", "cmd2", "progPos", "cmd1pos1", "-s", "some string", "cmd1pos2", "cmd2pos1"}
    add order: parser, cmd1, cmd2, cmd2pos1, progPos, cmd1pos1, cmd1pos2, strval
----->
--- FAIL: TestPos9 (0.00s)
    argparse_test.go:3034: unknown arguments cmd2pos1
    argparse_test.go:3044: *strval expected "some string", but got ""
    argparse_test.go:3047: *progPos expected "progPos", but got "cmd1pos2"
    argparse_test.go:3053: *cmd1pos2 expected "cmd1pos1", but got "some string"
    argparse_test.go:3056: *cmd2pos1 expected "cmd2pos1", but got "progPos"

In this scenario, where strval is added last, the positional arg logic causes cmd1pos1 to skip "-s" but consume "some string". This situation is complicated because "-s" may or may not have a default or be a boolean-flag.

Options

I see roughly three approaches:

  1. Leave as is
  2. Cause positionals to be parsed after flags are parsed. This is probably the cleanest solution possible.
  3. Alter the parse logic to check if the flag detected can accept a value and if so then always consume it (disregarding defaults). This is substantially messier.
vsachs commented 2 years ago

Also I left a few comments with some minor things in current code.

@akamensky Sorry but I don't see your comments anywhere? Did github swallow them?

akamensky commented 2 years ago

@vsachs Sorry, seems Github added new function to have all comments "pending" until review submitted. Should be visible now.

akamensky commented 2 years ago

I see roughly three approaches:

  1. Leave as is
  2. Cause positionals to be parsed after flags are parsed. This is probably the cleanest solution possible.
  3. Alter the parse logic to check if the flag detected can accept a value and if so then always consume it (disregarding defaults). This is substantially messier.

I think 2nd option is the better. It also would make it possible to avoid positionals be always required (see one of my comments in-code above)

vsachs commented 2 years ago

I see roughly three approaches:

  1. Leave as is
  2. Cause positionals to be parsed after flags are parsed. This is probably the cleanest solution possible.
  3. Alter the parse logic to check if the flag detected can accept a value and if so then always consume it (disregarding defaults). This is substantially messier.

I think 2nd option is the better. It also would make it possible to avoid positionals be always required (see one of my comments in-code above)

Cool. I've implemented option 2 in commit [c1c45f5].

Do you have an opinion on the first issue I outlined above? The Parse Order of Operations problem? I outlined 5 approaches but the door is open to others, including writing a completely new algo for the positional parsing which occurs as a totally separate secondary stage progressing as (root->leaf, left->right) linearly.

akamensky commented 2 years ago

Been quite busy lately, perhaps I missed a few points and did not elaborate enough. I feel there is a decent amount of overlap between questions. So I will try to provide more detailed response below.

On ordering issue

Changing this would probably require a more substantial alteration to the parsing logic. With the current code this is "behaves as expected" although it may be fairly unintuitive to devs and end users. Options:

  1. Leave it as is
  2. Prevent this from happening in the wild by requiring that positionals be added to only one command in any command heirarchy (perhaps the leaf command or just any one command).
  3. We could gracefully handle added positionals in non-leaf-nodes by shifting them down the tree if/when more commands are added, so that they are always parsed at the leaf node. Or some other variant of this approach.
  4. Alter logic to match "expectation" by causing positionals to be parsed "out of band" immediately after a command is recognized. So "parser" consumes progPos, cmd1 consumes cmd1Pos and cmd1Pos, cmd2 consumes cmd2pos1
  5. Alter logic to match "expectation" by causing command argument parsing to account for the total number of positionals registered and moving up to the correct index. This is fairly painful IMO.
  6. Something else?

One of the goals of this library is/was to provide a very clear and simple way for devs to handle command line arguments. I think parsing in order of definition (see more below on this) should be clear and more intuitive approach. I had my mind more on a 2-stage parsing, where all "standard" arguments are parsed first, and then 2nd stage parses remaining elements as positionals. But how exactly this is implemented does not matter much as long as the goal of simple and intuitive use of this library is achieved. You already can probably see that the internal code of this library is quite messy, which is the result of trying 1-starting with simple functionality and adding more over time and 2-trying to keep the usage of this library as straight-forward as possible. There are a few other libs with very high flexibility and more functionality than this one, but using them is rather painful and amounts to writing 100s of lines of code for something that should be done in 5 minutes (since it does not contribute that much to business logic of application being written).

In my opinion the most intuitive way to parsing positional arguments is in the order of their definition while following the order of command tree. That translates to: 1 - parent command positionals should precede sub-command positionals, 2 - within same command positionals should be parsed in order of their definition. To elaborate on that in more graphical terms:

// With command tree:
//     root - rootPositional1
//         cmd1 - cmd1Positional1, cmd1Positional2
//             cmd2 - cmd2Positional1, cmd2Positional2
// Defined in code as:
root := argparse.NewParser("root", "")
cmd1 := root.NewCommand("cmd1", "")
cmd2 := cmd1.NewCommand("cmd2", "")

rootPositional1 := root.Positional(opts)
cmd1Positional1 := cmd1.Positional(opts)
cmd2Positional1 := cmd2.Positional(opts) // note the ordering _within_ cmd2 is consistent
cmd1Positional2 := cmd1.Positional(opts)
cmd2Positional2 := cmd2.Positional(opts) // note the ordering _within_ cmd2 is consistent

// Provided input:
input := []string{"root", "cmd1", "cmd2", "p1", "p2", "p3", "p4", "p5"}

// Should result in follow values
// rootPositional1 == p1
// cmd1Positional1 == p2
// cmd1Positional2 == p3
// cmd2Positional1 == p4
// cmd2Positional2 == p5

I think above logic is rather straightforward and provides clear and easy to understand ordering rules.

On required vs optional

I believe above logic for ordering also creates 2 ways to implement "optional" positionals:

  1. Discard concept of optional/required for positionals. They are parsed in order of their definition, if there are more arguments than defined positionals -- we can throw an error about unparsed values, if there are less arguments than defined positionals -- we don't have to populate them (value == nil) and it will be up to developer to test whether value was provided (if value == nil ...). This also maintains a clear ordering with mixed types. i.e. if expected order is string int string and provided input is hello world, that is an error because 2nd argument must be a parseable integer
  2. Be able to define positional as either require or optional. I think this approach is very hard to implement in this library AND will be very hard and error prone in usability, specially when considering mixed types of input. But it is still doable.

An obvious issue with both is if application user provided some argument that does not exist (i.e. --non-existent someString) together with positionals being defined. This argument would then be interpreted as positional and could lead to errors. That is fine in my opinion and is the same behavior I've seen with number of CLI tools.

I am personally leaning towards 1st option here, because it is simple and easy to understand and should be rather easy to implement in this library. But please let me know if you see any other possible solutions to this issue, I may be missing some perspective here.

On List optionals

I think this would be very very nice feature. However with current implementation of library I do not see how we can achieve this. That being said, I think there is an alternative path to provide that could play along nicely with 1st option in "optional" positionals.

We could have:

  1. An internal flag on parser (set by some Set...() or With...() methods) that would tell parser to not fail if there are remaining unparsed arguments (default to false == fail if any arguments remain after parsing).
  2. A method to retrieve remaining unparsed arguments as-is (with signature along the lines of func (p *parser) Remaining() []string)

How those remaining unparsed arguments are parsed/interpreted can be left up to developer.

vsachs commented 2 years ago

On ordering issue

Okay, I will attempt to achieve the proposed outcome, which you've outlined in the code block clearly.

On required vs optional

Option 1 seems reasonable to me, at least in combination with the proposed parsing change related to the ordering implementation.

List Optionals

I agree this is quite valuable. In our use case of the library we have a workaround to achieve this. I was planning to approach this in a PR for issue #22 and I think since this PR continues to expand in code scope, that remains my preference.

vsachs commented 2 years ago

Parsing has been changed to the desired ordering and option 1 is utilized for positionals. Subsequently I've enabled defaults for positionals and added a GetParsed() function to the argument so that people can detect if a positional was detected or not.

akamensky commented 2 years ago

I was planning to approach this in a PR for issue https://github.com/akamensky/argparse/issues/22 and I think since this PR continues to expand in code scope, that remains my preference.

Hm, I can see how this logic somehow overlaps. This perhaps could be done through nargs functionality then if it will be added.

akamensky commented 2 years ago

@vsachs since you've invested into this PR so much time and effort (and since I have limited time to work on this lib), I have invited you to collaborators. Hence you can merge this PR whenever.

Please note that I have changed default branch from master to v1. And created orphaned branch v2 for a possible v2 version (a clean rewrite, rethink of logic and API).

vsachs commented 2 years ago

@vsachs since you've invested into this PR so much time and effort (and since I have limited time to work on this lib), I have invited you to collaborators. Hence you can merge this PR whenever.

Appreciated! I'll do a bit more local testing and merge it in.

Please note that I have changed default branch from master to v1. And created orphaned branch v2 for a possible v2 version (a clean rewrite, rethink of logic and API).

Sounds good!

akamensky commented 2 years ago

Awesome! Really appreciate your work on this issue. I think with this new feature can cut new release for v1.4.0.