GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
858 stars 359 forks source link

How to document long-form options, modifiers, and directives for GMT? #5561

Open maxrjones opened 3 years ago

maxrjones commented 3 years ago

Description of the issue

Paul outlined the proposed implementation of GMT long-form options, modifiers, and directives in the developer documentation (see also project 1). This issue is to decide the documentation structure for long-form options, modifiers, and directives.

Comparison with wrappers The GMT.jl documentation uses bars to separate the short option and its aliases and then uses links for the options that have long-formats for the modifiers/directives (e.g., https://www.generic-mapping-tools.org/GMT.jl/dev/arrows_control/#Vector-Attributes).

The PyGMT documentation includes a table for short-form/long-form options at the start of each modules' documentation and currently does not implement long-format for modifiers/directives.

Options for documenting long-options:

Here are a few options (suggestions are welcome):

  1. List the long-format documentation underneath the short format documentation. This would require deciding whether the long and short formats are shown just in the synopsis section or also in the description and arguments sections. For example:

    I**xinc[+e|n][/yinc[+e|n]] increment[+exact|number][/yinc[+exact|number**]]

  2. Add a dictionary showing the long-format for each option/module directive after the synopsis message
  3. Have a separate tree for documentation of long-format GMT versus short-format GMT (similar to modern vs. classic documentation)

Other issues: Will the long-options be documented both in the ReST documentation and the command line usage messages or only the ReST documentation? My vote is to only document in one place.

gd-a commented 3 years ago

I like the example given in devdocs

gmt blockmean --region=0/20/10/56 --increment=1 --registration=pixel --select=sum data.txt > sums.txt

I would explicit the units used for good measure.

Back on track, I would recommend to only use the long-format in the documentation, and have a separate table for "short option" (similar to common options page). For more "seasoned" users.

For the other issue, most of the command-line manual give both syntaxes with comma-seperation (man grep for example). Why not simply stick to this formalism ?

maxrjones commented 3 years ago

I like the example given in devdocs

gmt blockmean --region=0/20/10/56 --increment=1 --registration=pixel --select=sum data.txt > sums.txt

I would explicit the units used for good measure.

Back on track, I would recommend to only use the long-format in the documentation, and have a separate table for "short option" (similar to common options page). For more "seasoned" users.

I agree with this. I expect listing both short and long options would make the documentation more confusing and a table showing short versus long options would be a good compromise.

For the other issue, most of the command-line manual give both syntaxes with comma-seperation (man grep for example). Why not simply stick to this formalism ?

This works for the options, but what about the modifiers and directives? I do not know of other command-line programs that have as complicated of modifiers as GMT, which would make this a lot to maintain relative to the gawk command-line manual.

gd-a commented 3 years ago

Is it possible to have line break in cmd manual ? If it is, I imagine a line per modifier within the description block ?

maxrjones commented 3 years ago

Is it possible to have line break in cmd manual ? If it is, I imagine a line per modifier within the description block ?

Yes, this is possible. We could try it and see how it looks.

PaulWessel commented 3 years ago

@meghanrjones and I discussed the possibility of auto-generating some RST documentation from the C structure to avoid repetition of material (which will easily get out of sync otherwise) and to make the documentation update manageable. There are a few challenges that I did not immediately recognize. For instance, the structure might have this:

{ 0, 'B', "frame", "", "", "b,g,n,o,t,s", "box,fill,noframe,pole,title,subtitle" },

and from this I would like to generate the two strings:

-Baxes[+b][+ffill][+n][+olon/lat][+ttitle][+ssubtitle] --frame=axes[+box][+fill=fill][+noframe][+pole=lon/lat][+title=title][+subtitle=subtitle]

I will not be able to do this since there is no information in the structure to tell me if a modifier takes a) a required argument, b) an optional argument, or c) no argument. That information would need to be added to the structure, which can also aid in the parsing and finding parsing errors (missing required modifier argument, argument given to modifier that does not take one, etc.).

Furthermore, I would not be able to use sensible names other than [args] or args since suitable names for the arguments is not part of the definition above. So, for the benefit of documentation I may need to expand that structure to have more metadata that allows for better parsing and automatic RST generation.

PaulWessel commented 3 years ago

After updating for missing modifiers in the frame listing (+i,+w,+x,+y,+z) [i.e., #5592], it looks like this:

{ 0, 'B', "frame", "", "", "b,g,i,n,o,s,t,w,x,y,z", "box,fill,interior,noframe,pole,subtitle,title,pen,yzfill,xzfill,xyfill" },

To have the required information I think it may need to be this:

{ 0, 'B', "frame", "", "", "b,g,i,n,o,s,t,w,x,y,z", "box,fill=fill,interior[=val],noframe,pole=lon/lat,subtitle=text,title=text,pen[=pen],yzfill=fill,xzfill=fill,xyfill=fill" },

With this I will know which modifiers take required (e.g., +fill), optional (e.g., +pen), or no argument (e.g., +box), and what words are recommended to use for those dummy arguments in the docs.

PaulWessel commented 3 years ago

To be clear, the proposed change above would internally be handled in two different ways:

  1. The new metadata would be stripped off so that the current function that converts a real long-option argument to the short-option equivalent will work as before. Given the = and [] characters it is easy to strip these off.
  2. I will write a new function that will parse these lines and that one will use the metadata to format proper *.rst_ include files that creates a table of long versus short options on a per module-basis (plus one for the common args). These can then be included by each module.

Let me know if you see any concerns about this enhancement, @meghanrjones as I am eager to take the deep dive.

maxrjones commented 3 years ago

Yes, seems good to me. Only complications I could see are for the options that use =, such as -a common and gmt clear options.

PaulWessel commented 3 years ago

Good point, maybe I will look for another character than = to mean = and that we don't use. Like *.

PaulWessel commented 3 years ago

Looks like the same applies to directives. Currently, the gmt_common_longoptions.h structure does not yet list a directive for --frame, but we know that has to be the optional argument axes, e.g., --frame=Ws. But you can also say --frame+fill=red. It seems essential that these structures contain the information on whether or not a directive or argument is optional or required. We should do that using []. Internally, the current use of these structures should just strip off the =[arg] strings and then the parsing works, but for the function I will add it will need that to create RST tables with content like

-B[axes][+b][+gfill][+n][+ttitle] is equivalent to --frame[=axes][+box][+fill=fill][+noframe][+title=text]

PS. Looking at +fill. I don't like it. +fill could be a value going into a grid if not filled, for instance. isn't +paint=fill better?

seisman commented 3 years ago

Just one quick comment. The long-form options definitely will make GMT scripts more readable and also will make GMT wrappers easier to maintain, but these long names will also make it difficult to write GMT scripts.

Python and Julia like long names because most text editors support auto-completion for these languages, while it seems technically difficult to let any editor autocomplete GMT's long-form options, modifiers and directives.

PaulWessel commented 3 years ago

...but these long names will also make it difficult to write GMT scripts.

Yes, undoubtedly true. While many editors can accept plugins to learn about more languages, I am not sure if we will be in the business of building such plugins, but there may be some users who could do this.

I think the argument that it makes the script more readable is still a strong enough argument that we will continue on that path.

However, I still see more cleaning up to do with the way we present the short options. Take -bi for instance:

-bi[<ncols>][<type>][w][+l|b]
     Select binary input; data <type> = c|u|h|H|i|I|l|L|f|D.
     Prepend <ncols> for the number of columns for each <type>.

The online docs explains more (say what w, +l|b means). However, in a long-option framework the plan was to use

--binary=input:<combo>[+big_endian|little_endian]

In reality, [<ncols>][<type>]is a bit complicated since either can be optional. We can do -bi3d or just -bid (module knows how many) but I have never seen (nor do I know if it works) -bi4 (where d is the default). And of course [<ncols>][<type>]can be repeated. We are not going to list [<ncols1>][<type1>][<ncols1>][<type1>]... though so I think it is better to use <combo> (a required argument) and then explain what that is. In that scheme, w would not be singled out on the synopsis line as it is above. So for our short-format version perhaps this is better:

-bi<combo>[+l|b]
    Select native binary table input; <combo> is one or more concatenated groups of [<ncols>][<type>] 
    where <ncols> is the number of consecutive columns of given <type> (taken from c|u|h|H|i|I|l|L|f|D). 
    Default <ncols> is module dependent, and default <type> is d. 
    Each group may be individually byte-swapped by appending w. Modifiers +l or +b .....
PaulWessel commented 3 years ago

I think I need to go through these common options and see if I can better document them in a way that says the same things but uses better explanations, like what I suggest for -bi.

joa-quim commented 3 years ago

...but these long names will also make it difficult to write GMT scripts.

It's not going to be a clear gain. GMT convertees will see it as a useless thing but my experience in the Julia forum is: 100% of users who knew GMT said very powerful but I can't stand the syntax

joa-quim commented 3 years ago

PS. Looking at +fill. I don't like it. +fill could be a value going into a grid if not filled, for instance. isn't +paint=fill better?

Julia uses [:fill :bg :bgcolor :background]

PaulWessel commented 3 years ago

So bg and bgcolor are out as the first is unclear and the second says it must be a color (but we may want a pattern). background is not bad though.