MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
294 stars 181 forks source link

Python API: inconsistent handling of metavar [question] #2709

Open fionaEyoung opened 1 year ago

fionaEyoung commented 1 year ago

I've been using the python api in an external module and have run into some inconsistencies in the use of the argument/option metavar feature. This is more of a discussion / clarification issue before I go ahead opening any PRs or suchlike. Also this may all be moot since the api may or may not be due an overhaul.

(For context, I'm using metavar="input1 output1 [input2 output2]" and nargs=+ for a positional argument to get a variable number of input/output pairs, similar to other commands e.g. dwi2fod. There may be issues in other places I haven't come across e.g. in subparsers, which I'm not using)

1. Mostly metavar is checked for, sometimes it isn't.

This gives inconsistent usage output in particular if a metavar is used for a positional. I figure this is just an omission, since I can't think of a scenario where you'd not want to support metavar.

proposed change: always check; only place I think this is missing is in printing the usage line at the start of the full help text here, but there may be others I've not come across.

2. There are different ways for formatting metavar

This one confused me until I realised argparse allows using tuples to specify multiple metavars when nargs>1. So this would make the intended usage in my case look like this: nargs='+', metavar=('input1 output1', '[input2 output2 ...]') to get the display usage: prog [options] input1 output1 [input2 output2 ...]. Unfortunately, bugs in argparse means this is only stable for optionals, not for positionals. If I just use a single string (metavar='input1 output1 [input2 output2 ...]') the help gets formatted as Usage: prog i n p u t 1 o u t p u t 1 [ i n p u t 2 o u t p u t 2 . . . ] 😃

The -continue option makes use of the tupled metavar, but I don't see the benefit over just using a multi-word string, given that the destination is still a single variable and accessed via indexing.*

proposed change: not sure. Easiest option seems to be not supporting tuples for positionals, since internal argparse bug means this doesn't work anyway... This would mean not branching on tuples (and not doing ' '.join(metavar)) when printing positionals, leaving everything else as is. I don't think a blanket ban on tuple metavars makes sense since users familiar with argparse would expect to be able to use them. But then again, given that a tupled metavar only affects the print behaviour (which is overridden in mrtrix anyway), maybe that would be easier and more consistent?

Happy to keep changes to a bare minimum, or just deal with it on my end, since this is a pretty niche problem.


* in untampered argparse there is some benefit, because for nargs='*' you get nice optional formatting like this: '-opt', metavar=('in1', 'in2') --> usage: prog [-h] [-opt in1 [in2 ...]], but this is overridden in MRtrix

Lestropie commented 1 year ago

I've known of issues with argparse metavar for some time (particularly the space-between-every-character one), but seems I never listed it on here. Then again I get told off for listing too many issues so :man_shrugging:

Also this may all be moot since the api may or may not be due an https://github.com/MRtrix3/mrtrix3/issues/1608.

Given the efforts on #2678 the full overhaul is probably unlikely, even if it would be my long-term preference.

For context, I'm using metavar="input1 output1 [input2 output2]" and nargs=+ for a positional argument to get a variable number of input/output pairs,

Mostly metavar is checked for, sometimes it isn't.

Agreed; that'll just be me overlooking things. I will have written enough code to make exemplar help pages etc. look the way I think they should look, but not guaranteed to have tested every scenario. Particularly positionals with metavar, which I've likely never used. Though in the process of #2678 I did come across population_template post-addition of multi-contrast and really disliked the interface; I've tweaked it a bit (#2680) but yours might be better.

There are different ways for formatting metavar

This one confused me until I realised argparse allows using tuples to specify multiple metavars when nargs>1.

Also need to check whether I myself have misunderstood something in argparse and/or done an inconsistent implementation. I think that in some places, where nargs is > 1 but finite, I may have used a tuple to specify separate strings for the individual positional arguments following the option. This is consistent with the MRtrix3 C++ command-line interface, but less so with argparse, which for instance assumes that all arguments following the option will be of the same type. So it might be that avoiding the use of tupled metavars, and just space-separating in the case where multiple positional arguments have different meanings, is the way to go.

in untampered argparse there is some benefit, because for nargs='*' you get nice optional formatting ... but this is overridden in MRtrix

I probably avoided that from the outset because that parsing mechanism is highly incompatible with the MRtrix3 capability of arbitrary movement of command-line options. But if there's a case for it we can try to make it work.

fionaEyoung commented 1 year ago

Though in the process of https://github.com/MRtrix3/mrtrix3/pull/2678 I did come across population_template post-addition of multi-contrast and really disliked the interface; I've tweaked it a bit (https://github.com/MRtrix3/mrtrix3/pull/2680)

Yes I saw this when snooping around for similar examples, it's pretty close to my use case. I agree having all but the last parsed to a single argument, with an additional one for the final template is a bit awkward, but amounts to the same thing as a single narg='+', and I can see how it would come about if the multi-contrast registration functionality was added on later.

I think that in some places, where nargs is > 1 but finite, I may have used a tuple to specify separate strings for the individual positional arguments following the option.

Yep, for example add_argument('-continue', nargs=2, metavar=('<ScratchDir>', '<LastFile>'))

consistent with the MRtrix3 C++ command-line interface, but less so with argparse, which for instance assumes that all arguments following the option will be of the same type.

It really seems to me like the only difference between using metavar='first second' and metavar=('first', 'second') is the output of ArgumentParser.print_help()*, which is a) only properly functional for optionals* and b) overridden by mrtrix3.app.Parser.print_help() anyways.

that parsing mechanism is highly incompatible with the MRtrix3 capability of arbitrary movement of command-line options

Certainly something like add_argument('positionals', nargs='+', metavar=('var_in_1', 'var_in_2')) and prog var_in_1 -quiet var_in_2 could never work with argparse because you can't split up positionals might actually be possible! (TIL)

I'm of the mind that given mrtrix3.app's reimplementation of print_help(), there's no need to use tuples for metavars. If you want to keep them for readability I would go for some sort of format_arg() function to keep things consistent. something like this


*Here are the various behaviours when metavar is a tuple in unnecessary detail

i.e. `('metavar',)` or `('metavar1', 'metavar2')` but not `('metavar')` or `'metavar'` - positional arguments: - nargs is fixed integer: - `ArgumentParser.add_argument()` works fine as long as nargs==len(metavar) - nargs is + or *: - `ArgumentParser.add_argument()` works fine as long as metavar is either a tuple of length 2 (exactly) or a single string - for **any** nargs - `ArgumentParser.print_help()` [crashes](https://bugs.python.org/issue14074). *[not a problem for mrtrix.app which overrides `print_help()`]* - a separate bug in argparse ` _parse_known_args()` causes a crash if user supplies wrong number of positional arguments. This is the reason why my workaround of surrounding my single metavar string with `[]` is unstable. - optional arguments - nargs is fixed integer: - nargs==len(metavar) - `ArgumentParser.print_help()`: option formatted as `-opt metavar0 metavar1 metavar2` - nargs is * - `ArgumentParser.add_argument()`: metavar must be either single string or tuple of length (1 or 2) - `ArgumentParser.print_help()`: option formatted as: `[-opt [metavar ...]]` if ( `metavar='metavar'` or `metavar=('metavar',)`) or `[-opt [metavar1 [metavar2...]]]` if `metavar=('metavar1', 'metavar2')` - nargs is + - `ArgumentParser.add_argument()`: metavar must be either single string or tuple of length 2 - `ArgumentParser.print_help()`: option is formatted as `[-opt metavar [metavar...]]` if `metavar='metavar'` else `[-opt metavar1 [metavar2...]]` if `metavar=('metavar1', 'metavar2')`

Lestropie commented 1 year ago

I think that in some places, where nargs is > 1 but finite, I may have used a tuple to specify separate strings for the individual positional arguments following the option.

Prior post I thought this may have been an error on my part, but it is actually consistent with the argparse documentation:

parser.add_argument('--foo', nargs=2, metavar=('bar', 'baz'))
parser.print_help()
usage: PROG [-h] [-x X X] [--foo bar baz]
options:
 --foo bar baz

But as per your extensive reporting, this only makes sense if len(metavar) == nargs, and there's currently no checks around that.

Here are the various behaviours when metavar is a tuple in unnecessary detail

Where such detail would actually be of greater function would be to produce an exemplar command that constructs positionals & optionals of all configurations of interest, and then show which bits of which outputs (not just inline help page, also eg. RST / MarkDown) we believe to be showing the wrong behaviour. Then we can tweak the API code and test the same command. Best way to be confident that all bases are covering and a change to fix one thing isn't breaking something else.