PaulWessel / CodingTasks

Organized issues tracker for tasks related to GMT
0 stars 0 forks source link

Basic information about the plan for long-format GMT options #1

Open PaulWessel opened 2 years ago

PaulWessel commented 2 years ago

Background

The most recent NSF proposal for GMT support discussed introducing more human-readable command-line options in GMT so that it would be possible to understand what GMT options are doing without having to consult a manual for translation. For instance, current a GMT command line might look like this:

gmt blockmean infile.txt -R0/30/0/30 -I2/1 -Wi > out.txt

where it is not clear to anyone new what this means. The anticipated long-format version of the same command would be

gmt blockmean infile.txt --region=0/30/0/30 --increment=2/1 --weights=in > out.txt

which is much clearer. GMT modules all rely on two parsers: _GMT_ParseCommon (an API function) for all common options (-R -J, etc.) and the static local functions parse (one in every module C file) for the module-specific options (e.g., -W above). In order to implement long-options the decision was made to introduce a translator from the long-format syntax to the corresponding short-format syntax that can be called just before the established parsers are called. A prototype translator called gmtinit_translate_to_short_options was coded and is tested just for the common options. The translator will depend on two arrays of structures:

  1. _gmt_commonkw: Common to all modules, it is defined in gmt_common_longoptions.h
  2. _modulekw: This is a local array of structures define at the top of each module C file and passed into _gmt_initmodule.

The _gmt_initmodule function does lots of stuff, now including the translation based on these two vocabularies. For now only a few modules have local structures, e.g., blockm*.c.

Option Template

The premise for the translation is that the two forms (long and short) are equivalent. This is not necessarily true for everything even though it certainly is true for many options. Ideally, short-format options should follow this template:

-<short_option>[<short_directives>][+<short_modifier1>[<argument1>]][+<short_modifier2>[<argument2>]][...]

while the long-format options look like

--<long_option>[=<long_directives>[:][<arg>][+<long_modifier1>[=<arg1>]][+<long_modifier2>[=<arg2>]][...]

Thus, depending if an option has directives or not, and if those take values, things can look like this (keeping modifiers off for now):

--<long_option>
--<long_option>=<arg>
--<long_option>=<long_directives>
--<long_option>=<long_directives>:<arg>

I think some options may take more than one directives or arguments, and these may be separated by commas or slashes (as specified in the structure).

Historically, some short options in GMT have changed from being extremely long and cumbersome to now still being complicated even after being split into multiple options. The most important one of those is the -B option for basemap annotation and frame selection. To not get bogged down in intractable situations I propose we simply wait with -B which I suspect will be a special case that cannot simply be translated back and forth. But I do not know yet. Below, we can explore which options satisfy or do not satisfy the round-trip translation criterion.

surface

Here is the list of non-common options available in surface:

-G<outgrid>[=<ID>][+d<divisor>][+n<invalid>][+o<offset>|a][+s<scale>|a][:<driver>[/<dataType>][+c<options>]]
-I<xinc>[+e|n][/<yinc>[+e|n]] 
[-A<aspect_ratio>|m]
[-C<convergence_limit>] [-D<breakline>[+z[<zlevel>]]]
[-Ll|u<limit>]
[-M<radius>]
[-N<n_iterations>]
[-Q[r]]
[-S<search_radius>[m|s]]
[-T[b|i]<tension>]
[-W[<logfile>]]
[-Z<over_relaxation>]

First problem is the -G output grid where we historically have used = to append a specific grid format code. Hence, that might look like a long-format directive but it is short-format syntax. However, all the rest seems straightforward to make into long-options:

--increments=<xinc>[+exact|number][/<yinc>[+exact|number]] 
[--aspect=<aspect_ratio>|middle]
[--convergence=<convergence_limit>]
[--breakline=<breakline>[+zvalue[<zlevel>]]]
[--limit=lower|upper:<limit>]
[--mask=<radius>]
[--iternations=<n_iterations>]
[--quick[=region]]
[--radius=<search_radius>[m|s]]
[--tension=[boundary|interior:]<tension>]
[--logfile=[<logfile>]]
[--relax=<over_relaxation>]

The -I is a near-common option and it is already implemented via the constant GMT_INCREMENT_KW defined in gmt_constants.h. I lists '/' as separator, meaning the arguments may be repeated with a slash between them.

Here is a surface command from one of the tests (test/surface/limits.sh) in short and then as a plausible long-format:

gmt surface @Table_5_11.txt -R0/6.3/0/6.3 -I0.1 -GLu_limits.grd -Lu901
gmt surface @Table_5_11.txt --region=0/6.3/0/6.3 --increment=0.1 --outfile=Lu_limits.grd --limit=upper:901

Selecting suitable long-format words for directives and modifiers

Because both PyGMT and especially GMT.jl have moved ahead and implemented lots of keyword/value pairs, we should strive to use their word choices unless we feel it is just the wrong phrase. Thus, it is likely that Paul will have opinions on any selection. Hence, the initial selection of words is not crucial since they are just a set of unique strings (per module). Roger can invent suitable phrases that he things makes sense and fill out the KW structure arrays. Once a module has long-options that work we can revise the actual words as we see fit.

PaulWessel commented 2 years ago

Thanks for checking. THere are a few more moving parts I see:

In gmt_parse around line 650 there is the initial parsing that sets the option to GMT_OPT_SYNOPSIS or GMT_OPT_USAGE. I guess this is one place to add things like

#ifdef USE_COMMON_LONG_OPTIONS
        else if (!strcmp (args[arg], "--shorthelp") && n_args == 1) /* extended synopsis + */
            first_char = 1, option = GMT_OPT_USAGE, G->common.synopsis.extended = true;
#endif

or similar, and same for --synopsis. Note that --help is actually already handled (we must have added that a long time ago).

I am not sure about the gmt_init.c section. The check for options->arg[0] there seems to duplicate what should already have happened inGMT_Create_Options; perhaps I forgot and duplicated or perhaps there is some subtle thing - not sure - but GMT_Create_Options is always called before gmt_report_usage.

rbdavis commented 2 years ago

OK, will poke around gmt_parse and add mods there unless anything looks fishy, in which case I'll get back to you.

Spent longer than I would have liked yesterday hand-mangling the very, very many pscoast commands in the tests to convert all shortopts to longopts. Ran them early this morning and turned up a bunch of errors, all almost certainly from me messing up the mangling. Picking those over now ...

rbdavis commented 2 years ago

Shouldn't belittle my own hand-mangling, I guess ... looks like we may have a real translation algorithm issue. I believe we could in general have trouble with any option which would otherwise legitimately use a colon for any reason, e.g., scales like 1:10000. Look what happened here to the -J scale spec:

>  gmt pscoast --region=-2000/2000/-1000/1000+unit=k --projection=oA-30/60/-180/1:60000000 -P -K --xshift=1.25i --yshift=9i >/dev/null
gmt [WARNING]: Reformatted options: -R-2000/2000/-1000/1000+uk -JoA-30/60/-180/1 -P -K -X1.25i -Y9i
mapproject [WARNING]: Your scale of 1 in -J was interpreted to mean 1:1 since no plotting is involved.

It appears that the L2S translator stripped away the :60000000 . I will debug this later today and see if there's any obvious fix, but I can imagine that might prove complicated. If that's not doable then the obvious solution would be to use something other than a colon as the longopt directive/arg separator. Hopefully there remains some lonely alphabetic character not used in any GMT option spec that could be made to feel useful here? I suppose any character can legally appear as part of a string which GMT might accept as a label or whatnot, so I'm not sure how that would be handled.

For now, running down the rest of the test problems, with a good ways to go ...

PaulWessel commented 2 years ago

Good one. Let's see what else shows up but 1:xxxxxx is a pretty unique argument and I cannot htink of any other value being given as a ratio that way. I think it would be fairly easy to work around this by scanning first for 1:xxxxxx if --projection is given and if so temporarily change it to 1#xxxxx or something, then do the translation etc and at the end switch it back to :. I would prefer that rather than using something else since we must avoid the UNIX special characters and that leaves not many options..

rbdavis commented 2 years ago

I did mess up a few translations in the pscoast test suite, but 85% of the errors were due to colon mishandling in -J. Are you confident that colon is not used elsewhere? I have noticed a few option specs over the last few days that accept a mapscale-ish parameter, although I can't recall where those were, or maybe they only required the denominator, e.g., 10000, and not the whole 1:10000 string.

Rerunning pscoast tests now with the problematic (i.e., colonic) -J specs restored to shortpt form. By the way, is there a quick command or alias to just run a single module test? I'm sure there must be but I have not located any such in the .bashrc aliases.

PaulWessel commented 2 years ago

yes. If you cd into the build directory (e.g., rbuild) and typectest -R something, where something is either the name of the test you want or part of it, it will run whatever matches that pattern. so ctest -R pscoast will run all tests because the directory is called pscoast.

rbdavis commented 2 years ago

Thanks for the testing tip!

Just found another colon problem: -R's dd:mm:ss option spec:

> gmt pscoast --region=-10/180/40:44:11.8/90 -JW30/10i --area=0/0/2 --xshift=center --yshift=center >/dev/null
gmt [WARNING]: Reformatted options: -R-10/180/40 -JW30/10i -A0/0/2 -Xc -Yc
pscoast [ERROR]: Option -R parsing failure.

Are there other dd:mm:ss option specs, or maybe hr:min:sec specs? This is getting more complicated. ;-(

PaulWessel commented 2 years ago

Yes, that one is worse and rules out my previous scheme since longitudes and latitudes are passed to many options across GMT and would not be that simple (like 1:xxxx) to find.

Perhaps we should make a list of possible alternative characters that could replace directive:arg. Of all the non-alpha characters, many are Unix (or DOS: %) special characters and would require quoting to avoid special action (and we don't want to quote these). So these are out I think:

! @ # $ % & * ( ) | \ < >

I am less certain about ~ ^ . They do have special meaning but if ~ is not at the start I dont think the shell cares? The ^ may not be readily available on some non-US keyboards (?). We dont want to use something that requires some CTRL sequences or similar.

[ ] { } may not be on non-US keyboards. That leaves ; and possibly ~ and ^

PaulWessel commented 2 years ago

Given eventual bedtime ruining things: One alternative option is to say we use 2 charactres for this, e.g.

--longishoption=directive:=argument

So since--region=-10/180/40:44:11.8/90 has no:= there are no directive argument present, only an argument.

The := is used by some programming language(s) for assignment - cannot recall which one - so it is not completely made up.

rbdavis commented 2 years ago

:= is Pascal, and I always have wished that C used that instead of = . I have made a habit through the years of writing assignments as

x= y;

(note no space between x and =) rather than

x = y;

as the latter is more easily confused with ==, I think.

Are you sure := would not pop up anywhere as a legal option spec? I can take a look at the translate routine with that in mind, and also experiment with ; ^ ~ (all of which are shell characters, I think, but may not be interpreted as such when mid-string).

rbdavis commented 2 years ago

One more -J problem, and this may be a more generic issue that carries over to other option types. The common.h translation table's -J entry includes no short-modifier translations, and this appears to be mangling translation of those -J shortopt short-modifiers that in fact exist:

>  gmt pscoast --region=122W/35N/107W/22N+rectangular --projection=Oa120W/25N/150/12c+dh+v -Bafg -O -K --yshift=7c >/dev/null
gmt [WARNING]: Long-modifier form Oa120W/25N/150/12c+dh+v for option -J not recognized!
gmt [WARNING]: Long-modifier form Oa120W/25N/150/12c+dh+v for option -J not recognized!
gmt [WARNING]: Reformatted options: -R122W/35N/107W/22N+r -JOa120W/25N/150/12c -Bafg -O -K -Y7c

I have noticed -J's shortopts short-modifiers, and also noted that the same short-modifiers have different meanings depending upon projection type. I guess there's no reason you couldn't use different longopts long-modifier strings (for different projection types) that translate to the same shortopt string, but that would obviously break reverse-translation.

PaulWessel commented 2 years ago

Trouble with -J is that we have some projections using two chars, e.g. -JOa and -Jks for instance. Again, may have to finess those.

rbdavis commented 2 years ago

-JOa and -Jks are directives, not modifiers, right? I don't think they come into play with respect to this problem with translation-mangled modifiers.

Looks like I'll be goofing with the debugger for a good while this afternoon looking at these issues and experimenting. Will let you know what happens.

PaulWessel commented 2 years ago

Yes, in our model they would be directives of --projection, but that probably means our common_longtable is not handling there, since we would need names for all the projections. And here we are in luck because they all have names in proj and we accept them already. See _gmtinit_parseproj4 in gmt_init.c for some ideas.

Sorry, going to bed but may be able to respond to things for the next hour.

rbdavis commented 2 years ago

Sleep well, I've got plenty to do for the day without you needing to check e-mail again!

rbdavis commented 2 years ago

In gmt_parse around line 650 there is the initial parsing that sets the option to GMT_OPT_SYNOPSIS or GMT_OPT_USAGE. I guess this is one place to add things like ...

OK, this is done, I hope. I spent an outrageous amount of time in the debugger trying to figure out exactly what the setting of first_char meant, and ultimately decided it essentially meant nothing at least in the cases we are dealing with here. So I just set it to point at the terminating null byte at the end of the argument string (as seemed to be happening in the existing --help clause), quickly tested it, and the correct usage messages were displayed with no core dump. I did not bother with the #ifdef LONGOPTIONS here, figuring if --help was already there without that #ifdef there was no good reason the new additions should not be present also. Here's the new code, including the existing --help clause. Let me know if you have any problems with this!

                else if (!strcmp(args[arg], "--help"))          /* mimic '-?' */
                        first_char = 6, option = GMT_OPT_USAGE;
                else if (!strcmp(args[arg], "--shorthelp"))     /* mimic '-+' */
                        first_char = 11, option = GMT_OPT_USAGE, G->common.synopsis.extended = true;
                else if (!strcmp(args[arg], "--synopsis"))      /* mimic '-^' */
                        first_char = 10, option = GMT_OPT_SYNOPSIS;
rbdavis commented 2 years ago

Hi Paul, I spent the rest of the late afternoon looking at the problem with -J and the shortopt modifiers. What I have essentially discovered, mostly through looking at your translate routine in the debugger, is that without changes to the translate routine we will not be able to do a completely functional longopts version of any short option unless we create a translation table entry that includes all possible short modifiers (paired with their long modifier equivalents). This is because when the translator finds a modifier in the longopts option string it then searches the translation table entry for a matching long modifier, and if it does not find one it will drop the entire modifier (up to the next +) from the translated shortopts string.

Although I did not test this hypothetical solution, I believe that we may be able to create a working entry with a trivial long-to-short (and reverse!) modifier mapping, i.e., where the long modifiers are simply the same unicharacter short modifiers, e.g.:

Long modifier list: "a,b,c,d,g,h,t,v,z" Short modifier list: "a,b,c,d,g,h,t,v,z"

Another hypothetical (and likewise untested) possible solution would be to change the translation code to just copy any unrecognized modifiers from the original longopts option string to the new shortopts option string rather than discarding them.

In the case of -J in particular, I think the L2S translation could be made to work without doing either of the above, but it would be a non-ideal translation because several of the -J short modifiers (namely a, t, v, and z) modify different attributes depending upon projection type, e.g., +t specifies tilt for the perspective projection but origin for the non-geographic projection. Basically, we would either (i) have to use a single longopts modifier string to map to both versions of +t (which would make that longopts string not particularly descriptive), or (ii) create a mapping with repeated instances of the same short modifier character (which would make the inverse short-to-long translation impossible, at least without parsing for projection type, etc., which would become a nightmare if the technique also had to be applied to other option specs). Part of such a non-uniquely-invertable mapping might look like this:

Long modifier list: ... ,tilt,origin, ... Short modifier list: ... ,t,t, ...

Anyways, I think we need to pick one of the above solutions unless either you can think of something else or you would prefer to just abandon -J longopts entirely (and possibly many others with similar modifier complexities).

PaulWessel commented 2 years ago

I think the -J is a bit of a red herring. I did not state this clearly before I think but since the long-term goal is for GMT to use the proj library and, importantly. the proj syntax for specifying projections there is really no good reason for us to convert those (i.e., long-format -J) to short-format just for the hell of it. As you point out, the -J is very messy and to create a one-to-one would be a lot of work for no obvious long-term gain. So, I think it is best not to worry about -J translation for now. Remember GMT currently do parse proj4 projection strings, separately. Eventually, the current short GMT -J options will be legacy options.

The long history and ugliness of options like -J and -B makes them problematic. And perhaps having a one-to-one is not really important. The critical thing is of course the ability to parse long-format options to short - not the other way around (that was just included as a way to test things).

I think we do need to find out if there are module options that cannot be translated. My guess is there might be a few, and when I have run into that before I have introduced revised syntax that follows the new rules and handle the old mess as a backwards compatible case. Hence I think the scheme we have will work, and that -B -J may needs some separate TLC, perhaps.

I'm going to see if I can make a sorted list of all module options in one file to see if there are obvious problem spots.

PaulWessel commented 2 years ago

I cleaned up some minor things in the options, see https://github.com/GenericMappingTools/gmt/pull/6888 and unless anyone beat you to it, please have a look and approve. Note the instructions in the sandbox repo [you can either browser this from the GitHub page or clone that repo. It needs a compiler switch set to produce the dump (and then unset before recompiling). I am attaching the dump I have here (after cleaning up a bit). It is useful to see the range of options I think and to check things are consistent across modules, if possible.

final.txt

rbdavis commented 2 years ago

OK, I won't waste any more time on -J with regard to this point, but I just tested my trivial -J modifier L2S translation, i.e.,

Long modifier list: "d,a,t,v,w,z,f,k,r" Short modifier list: "d,a,t,v,w,z,f,k,r"

and it seems to work, so we can leave -J in the commonlongopts.h file, and it will not break one-to-one inverse translation. The longopts modifier names are of course not particularly useful. (We could improve the few that are unique to one projection type but that hardly seems worth the effort.

The essential take-away point here is that every module's complete modifier set must be fully handled in the translation table (if only by a trivial translation like the above) or it will not work. (Or, we have to alter the translation code to not discard unknown modifiers.)

I'll be catching up on yard chores this weekend so you probably won't see me much until Monday. My plan is to then look into modifying the translation code to use := in place of: . This assumes, of course, that you believe no option specs could possibly include the := two-character pair.

PaulWessel commented 2 years ago

Sounds good. Just remember on Monday that you may wish to merge master into any branch you are working on since there have been (unrelated) changes. Good habit to merge in the latest to minimize chance of conflicts later.

rbdavis commented 2 years ago

Hi Paul, been looking at the translate routine in more detail and the head-scratching has begun...

I think I've found a bug in gmtinit_find_kw(). Suppose the translation table has a "resolution" longopt (like pscoast, which I am using to test), and the option string "resolutionary=full" is used. This will generate a found match within gmtinit_find_kw() because:

arg is "resolutionary=full" , hence strlen(arg) is 18. kw[1][2] is "resolution", hence strlen(kw[1][2]) is 10. len is set to MIN(18, 10), which is 10. strncmp(arg, kw[1][2]), 10) returns 0, i.e., MATCHFOUND

This definitely would break any translation where the table had one longopt which was a prefix of another and the former was listed in the table prior to the latter. Do you care about this? I think it's definitely unexpected behavior and probably not desirable.

Also, this ties into some confusion I am having about code which precedes the call to this function in gmtinit_translate_to_short_options() where the code seems to contradicts the comments. It would help if I had a clearer picture of your intent here. The problematic lines in question are:

for (opt = *options; ...
    skip lines ...
    strcpy (orig, opt->arg);            /* Retain a copy of current option arguments */
    strcpy (copy, opt->arg);            /* Retain another copy of current option arguments */
    skip lines ...
    directive = strchr (copy, '=');     /* Get location of the equal sign, if it is present */
    modifier  = strchr (copy, '+');     /* Get location of the first plus sign, if any are present */

Note here that both directive and modifier are pointing somewhere within copy. Continuing to the part where I become confused by the comments:

    skip lines ...
    if (directive) directive[0] = '\0', got_directive = true;   /* Cut off =value for now so opt->arg only has the keyword, but remember if a directive was found */
    if (modifier) modifier[0] = '\0', got_modifier = true;      /* Cut off +modifier for now so orig only has the keyword, but remember if a modifier was found */
    if ((kw = gmtinit_find_kw (API, gmt_common_kw, this_module_kw, orig, &k)) == NULL) {

The lines which replace directive[0] and modifier[0] with '\0' are thus modifying only the copy string, and not opt->arg and/or orig, which remain unchanged. This is where I am confused by the comments which seem to indicate otherwise. Then gmtinit_find_kw() is called with the test string being orig, which has never been modified. Is this really what you intend? This is what causes the bug noted above.

PaulWessel commented 2 years ago

I think we should avoid directives and modifiers with names shortish and shortishbutlonger. Unless we actually check for this and do an exact check on the lengths. Since we are doing this per option it should not matter that option -A has one of these and -B the longer one since the search is per option.

Yes, it looks like comments and action are out of sync here. I agree with you that it seems like we should pass copy to gmtinit_find_kw since otherwise orig will have the =arg still there.

rbdavis commented 2 years ago

I agree with you that it seems like we should pass copy to gmtinit_find_kw ...

Good, I was hoping that's what you intended! The complexity of these option strings always leaves me thinking there's something more complex going on that's beyond my comprehension. Fixing and proceeding ...

rbdavis commented 2 years ago

Slow (i.e., not much) progress today stepping through the translate routine in the debugger testing how it behaves with various input, although at least I believe I understand it better than I did. I do have one fairly clear (I hope) question which relates to our most recent exchange this morning that addressed the question of what should happen when the incorrect --resolutionary=full is the option string as opposed to the correct --resolution=full string.

In that case --resolutionary (or --resolution ) is the longopt option name, and as such can only be followed by (i) = and then some other stuff or (ii) nothing at all. Thus, in gmtinit_find_kw(), we just want to strictly compare the translation entry's longopts name string with whatever in the option spec follows the -- and either (i) ends just before a = or (ii) runs to the end of the option string. The existing code in gmtinit_find_kw()

    len = MIN (len_given_keyword, strlen (kw[set][*k].long_option));
    if (!strncmp (arg, kw[set][*k].long_option, len)) break;

is wrong because it allows the incorrect --resolutionary to be passed off as a matched longopt option name by doing only a partial string comparison. What we really want in place of those two lines is the correct (and simpler) full string comparison

if (!strcmp (arg, kw[set][*k].long_option)) break;

which will recognize --resolutionary as a non-match. This seems entirely clear to me, and hopefully you agree.

Now, here's the question part of this post. There is analogous code in gmtinit_find_argument() which is used to match both long directive and long modifier strings, and this code also does a partial string comparison and similarly accepts incorrect long directive/modifier strings (exactly like the original gmtinit_find_kw() before I fixed the latter today), e.g.:

> gmt pscoast --resolution=fully
gmt [WARNING]: Reformatted options: -Df

is accepted as OK when it really should be

> gmt pscoast --resolution=full
gmt [WARNING]: Reformatted options: -Df

Here, my initial inclination was to say: OK, same problem, and I will fix it the same way, i.e., by doing a full string comparison which would recognize fully as not a match for the long directive full . However, after thinking about this for a while my untrustworthy but ever-active GMTadar started to kick in and I began to wonder if it was possible, in the large universe of GMT option spec, for some option spec somewhere to directly follow a directive (or modifier) that has no argument of its own with other miscellaneous stuff (other than a separator or another modifier), e.g:

--resolution=fully

where full is the directive name string (and this full directive has no argument) and the followingy is some other legal piece of the option spec which is not related to the directive full and is not a modifier. Can such an option spec occur and, if so, would we have to deal with it in terms of supporting long-option translation or would we just not even try?

I would like to replace the analogous two lines of gmtinit_find_argument() which read

    len = MIN (lent, strlen(item));
    if (!strncmp (item, text, len))

with the single line

if (!strcmp (item, text))

as that would correctly recognize errors like --resolution=fully as a non-match if you think it will not cause problems with any complicated option spec. Note that if we do not do this (because there is a weird option spec where --resolution=fully is legal) we still have a problem with the code, as it will currently translate --resolution=fully to -Df, not -Dfy .

I hope this makes sense!

PaulWessel commented 2 years ago

I am slowly remembering that the reason I did strncmp etc was so that lazy folks could do --res=full and if res was unique among the first 3 letters of all directives for that option then you could get away with it. The flip side is, as you show, that --resolutionary also would work.

I dont think it makes much sense to allow things like --regio=1/2/3/4 as that is pretty lame. So I think we should match exactly and your solution should work.

That being said, I am partial to the ideal that Joaquim is using in GMT.jl is using in Julia: aliases. I think he allows both proj and projection, for example, and maybe region and limits (cant recall). We can expand that later, but the way we would do that is for those directives in the structure to look like "region|limits|domain", and then the function can march along that string, pull out one directive alias at the time and check. Just planting that seed.

As for fully I think not. If we find such a beast the we will deal with it. Note there are number arguments that will have units, of course, so --stuff=14k and --stuff=14 are both valid (the latter probably will be assumed to be in the default meter).

rbdavis commented 2 years ago

As for fully I think not. If we find such a beast the we will deal with it.

OK, so also check for exact match only (with strcmp() ) inside gmtinit_find_argument()?

PaulWessel commented 2 years ago

Yes, makes sense.

rbdavis commented 2 years ago

Any reason we couldn't use :: instead of := to indicate the argument following a directive? Use of = here complicates other aspects of the translation parsing which are searching for = . Any other two-character sequence not using = would work equally well, too. Obviously it has to be a sequence that would not be likely to appear in any legitimate option spec.

PaulWessel commented 2 years ago

Well, no real reason, and given using just : ran into problems with geographic and clocks, I cannot argue that := is much better than ::, but I do like := better because of the "assignment" meaning. I am not sure I understand the problem with keeping := separate from =? Since we strip off the whole modifier section, aren't we just looking at --option=directive:=arg, so a strstr search for ":=" would find that uniquely and hide it for you to find the =, no?

rbdavis commented 2 years ago

The problem with := is not unsolvable, just a bit more work as a few different parts of the code which look for = with strchar() will all have to be changed. Using :: instead would mean none of that has to be touched. I wasn't sure how attached you were to := so I just thought I'd ask and maybe save some time. Will think about it some more.

rbdavis commented 2 years ago

No worries with :=, I think, I just wrote a strchr() substitute, seems OK.

rbdavis commented 2 years ago

Initial testing with replacement of : by := looks good. Will redo all surface and pscoast tests, then merge back into master and resume with next module on to-do list (pswiggle). (Probably not until Thurs. with tomorrow's sick day.)

Hope you're beating the heat over there!

PaulWessel commented 2 years ago

Great, off to London for the whole day - heat is subsiding I think.

rbdavis commented 2 years ago

Finished all re-testing with changes to translation routine and made a pull request. Starting work on pswiggle now.

rbdavis commented 2 years ago

Hi Paul, here's a followup question with regard to non-unique translations. We discussed this the other day with regard to -J and its modifiers, which mean different things depending upon the projection type and thus it seems natural that the same short-option modifier could have two different meaningful long-option modifier strings. We essentially punted on a decision there when I just used the trivial (but unhelpful with regard to useful long-option strings) mapping of

Long modifier list: "d,a,t,v,w,z,f,k,r" Short modifier list: "d,a,t,v,w,z,f,k,r"

The same problem is arising again in pswiggle (and was also present in pscoast although I didn't notice at the time) in the context of the -D mapscale option (-L for pscoast). Here, the justification directive can be short-specified with j or J . In pscoast I just ignored capital J and used the mapping

Long modifier list: "mapcoords,jcode,boxcoords,plotcoords" Short modifier list: "g,j,n,x"

Now in pswiggle I see the exact same thing (but with -D instead of -L). I could easily add (capital) J to the shortopt directive list so that an S2L inverse translation from J would work, but then I have to either use jcode twice in the long-option list (thus making the more common L2S forward translation non-unique (although it would effectively work since we really don't care whether the translated short option that results is j or J), or kludge up something silly like Jcode (for capital J). What do you think? The choice here is essentially:

(1) Do what I did for pscoast and just pretend capital J does not exist (breaks S2L inverse translation of capital J).

(2) Add the matched pair { J , jcode } to the mapping. Now, S2L inverse translation of capital J works, but the forward L2S translation of jcode is non-unique (although we effectively do not care since a result of either j or J will trigger the expected module behavior).

(3) Add the matched pair { J , Jcode } to the mapping. Again S2L will now work, and L2S is unique, but this strikes me as overly pedantic and more confusing than helpful to the user.

I guess my vote would be for #2 -- it's the least logically pure, but will effectively translate in either direction and do what the user expects without cluttering up the documentation and code with silly crap like jcode and Jcode.

I suppose our decision here should be extended to all future similar cases, exclusive of those like -J where there is a single short-opt character that can map to two conceptually different long-opt strings depending upon other context (or perhaps vice-versa should such a case exist somewhere).

rbdavis commented 2 years ago

Oops, nevermind (sort of)! Arrrrgh.

Looks like in pswiggle -Dj and -DJ mean slightly different things, so I guess that actually makes things simpler there. However, for pscoast the man page mentions no effective difference between -Lj and -LJ, so I guess my previous question still applies there?

rbdavis commented 2 years ago

pswiggle man page does not say what +o modifier of -D option means. I assume same meaning as for pscoast's +o modifier to -L option?

rbdavis commented 2 years ago

For pscoast -G we just did

{ 0, 'G', "land", "", "", "", "" }

Should we do the same for pswiggle (maybe using fill instead of land ) ?

rbdavis commented 2 years ago

Any reason that you have not included the +c modifier for the -d short-option in gmt_common_longoptions.h? As noted before, each possible short modifier must be mapped to a long-option equivalent or any translation using that modifier will fail:

>  pscoast --nodata=in+c7
pscoast [WARNING]: Long-modifier form in+c7 for option -d not recognized!
pscoast [WARNING]: Reformatted options: -di
rbdavis commented 2 years ago

Analogous question (re: previous comment) with respect to -g option translation in gmt_common_longoptions.h, where modifiers +n and +p but not modifiers +a or +c ?

rbdavis commented 2 years ago

... and again with respect to -h in the common .h translation table, where +m was omitted. I guess pswiggle is just hitting a bunch of these that I have not encountered before. I'm sorry to keep pestering you on these -- I'm happy to just use my own judgement as they pop up if you prefer, I only worry that I may be missing out on some subtlety.

rbdavis commented 2 years ago

Other short-options with entries in the common .h translation table file with missing/incorrect short modifiers that relate to pswiggle:

(1) -i in common .h is missing +d ;

(2) for -qi and -qo shortopts, there is a +f in the common .h table that I think is supposed to be a +t (i.e., lowercase-T rather than lowercase-F);

(3) -t in common .h is missing +f and +s ;

(4) the -: option seems to be missing entirely from the common .h.

(5) and to repeat from previous recent messages here just so these are all listed in one place, there are -d (missing +c), -g (missing +a and +c), and -h (missing +m).

I think this is everything missing from the common .h translation table that relates to pswiggle. But of course I'm sure I must be missing something! ;-> Please advise if I should attempt to correct all of the above or if there any particular ones I should omit for now (realizing that omission will mean translation failure if they are encountered in any long-option spec).

rbdavis commented 2 years ago

New problem with -G discovered in running pswiggle test plots: what do -G+greenand -G-blue mean? I do not see this usage form for -G documented anywhere. Translation of --fill=-blue works (whatever in blue blazes that might happen to mean ;-> ), but --fill=+green fails because the translator tries to interpret the +green substring as a long-option modifier.

Can we say for certain that the 2-character sequence =+ is an impossibility for any GMT option spec? If so I can perhaps modify the translator code to prevent the + in that 2-character sequence from being seen as a modifier.

rbdavis commented 2 years ago

Here is my pswiggle module translation table for your comments re: long-option names, etc:

static struct GMT_KEYWORD_DICTIONARY module_kw[] = { /* Local options for this module */
        /* separator, short_option, long_option, 
                  short_directives,    long_directives,
                  short_modifiers,     long_modifiers */
        { 0, 'A', "azimuth",           "", "", "", "" },
        { 0, 'C', "center",            "", "", "", "" },
        { 0, 'D', "mapscale",
                  "g,j,J,n,x",         "mapcoords,jcode,jcodemirror,boxcoords,plotcoords",
                  "w,j,a,o,l",         "length,janchor,side,anchoroffset,label" },
        { 0, 'F', "frmpen",
                  "",                  "",
                  "c,g,i,p,r,s",       "clearance,fill,inner,pen,radius,shade" },
        { 0, 'G', "fill",              "", "", "", "" },
        { 0, 'I', "fixedazimuth",      "", "", "", "" },
        { 0, 'T', "trackpen",          "", "", "", "" },
        { 0, 'W', "outlinepen",        "", "", "", "" },
        { 0, 'Z', "anomalyscale",      "", "", "", "" },
        { 0, '\0', "", "", "", "", ""}  /* End of list marked with empty option and strings */
};      

Except for the -G+greenproblem noted in the previous message, all pswiggle tests in the test directory pass with this table.

PaulWessel commented 2 years ago

New problem with -G discovered in running pswiggle test plots: what do -G+greenand -G-blue mean? I do not see this usage form for -G documented anywhere. Translation of --fill=-blue works (whatever in blue blazes that might happen to mean ;-> ), but --fill=+green fails because the translator tries to interpret the +green substring as a long-option modifier.

Can we say for certain that the 2-character sequence =+ is an impossibility for any GMT option spec? If so I can perhaps modify the translator code to prevent the + in that 2-character sequence from being seen as a modifier.

What you are seeing are deprecated syntax in some test scripts. Because it is good to test that we correctly handle deprecated syntax we do not update all scripts to run with the current syntax (since many scripts predate the change - just like users' scripts will too). In the 'G' parser you will see this:

`else if (strchr ("-+=", opt->arg[0])) {    /* Allow old syntax -G+|-|=<fill> */`

which is what you are running into with -G+green and -G-blue. So I understand that makes your job tricky since you have to check to see if a particular script is actually using the current syntax. The documentation will only discuss current syntax, but the code will usually quietly accept the old ways. Often (and I am not sure why not in pswiggle) that code may be inside an if-test like

sample1d.c: if (gmt_M_compat_check (GMT, 5)) {

and by raising the compatibility level (via GMT_COMPATIBILITY setting [4]) to 6 these sections will start to give errors for deprecated syntax. Because of history, we ship with a default of 4.

So, you do not need to worry about--fill=+green etc. Only the documented (i.e., the modifiers +p and +n) need to be dealt with. So I am guessing--fill=color+positive|negative, and if not given the the parser will select positive anyway.

PaulWessel commented 2 years ago

I think this is everything missing from the common .h translation table that relates to pswiggle. But of course I'm sure I must be missing something! ;-> Please advise if I should attempt to correct all of the above or if there any particular ones I should omit for now (realizing that omission will mean translation failure if they are encountered in any long-option spec).

Thanks for finding these. I think it is a mix of things added after I first did the table 1-2 years ago or oversight. We should add those in:

(1) -i in common .h is missing +d ;

Use "divide"

(2) for -qi and -qo shortopts, there is a +f in the common .h table that I think is supposed to be a +t (i.e., lowercase-T rather than lowercase-F);

Yes, we changed from "file" to "table" at some point. The -q short syntax still shows +f while the longer full discussion of -q has it right. I just fixed the f to t in two rst files in master, you can update the translation table to use t and "bytable".

(3) -t in common .h is missing +f and +s ;

Yes, needs to be added as +fill and +stroke. BTW, I just added the missing syntax to the RST tables so the dev docs should soon update with the correct syntax (as for -q).

(4) the -: option seems to be missing entirely from the common .h

Oops, yes it is. perhaps--latlon[=input|output].

5) and to repeat from previous recent messages here just so these are all listed in one place, there are -d (missing +c), -g (missing +a and +c), and -h (missing +m).

For these last ones:

-d: +c becomes +column -g: +a is +all, +c is +column -h: +m becomes +header

rbdavis commented 2 years ago

Thanks for all those answers on -G and the missing short modifiers for the other common shortops.

I also just realized that both pscoast and pswiggle DO have the same documented differences between the Jand j directives, sorry I managed to miss that. Will patch the pscoast translation table to match pswiggle's in that regard.

What you are seeing are deprecated syntax in some test scripts.

Regarding the test scripts, I've been using them to test my changes mostly because there's nothing else readily available to use. It would probably be preferable for each module to have a more thorough translation torture test to prove equivalency, but I don't think I should be the one designing the test due to my lack of experience with GMT option spec. I imagine the torture focus in the current test set is more about running down subtle differences in image output, etc., as opposed to complexity of option specs. The current set has certainly been very useful in exposing problems I would not have realized were possible so I will definitely continue as I have been for now, but I expect there are more translation issues yet to be discovered.

PaulWessel commented 2 years ago

That reminds me I did not actually answer the j and J stuff.  We use -DJ to place things outside the map frame and -Dj inside the map frame.  So -DJTC will be used for a color bar outside on the top while -DjBC might be for a map scale inside the frame on the bottom.  So given that, perhaps the long directives are outside and inside.  E.g.

—placement=inside:TR

The -D and a few places - are used for many things and always pertains to the placement of some map embellishment [https://docs.generic-mapping-tools.org/dev/cookbook/features.html#plot-embellishments]

Paul Wessel Professor, Department of Earth Sciences (808) 956-4778 | @.*** http://www.soest.hawaii.edu/earthsciences

On July 22, 2022 at 8:14:05 PM, Roger Davis @.***) wrote:

Thanks for all those answers on -G and the missing short modifiers for the other common shortops.

I also just realized that both pscoast and pswiggle DO have the same documented differences between the J and j directives, sorry I managed to miss that. Will patch the pscoast translation table to match pswiggle's in that regard.

What you are seeing are deprecated syntax in some test scripts.

Regarding the test scripts, I've been using them to test my changes mostly because there's nothing else readily available to use. It would probably be preferable for each module to have a more thorough translation torture test to prove equivalency, but I don't think I should be the one designing the test due to my lack of experience with GMT option spec. I imagine the torture focus in the current test set is more about running down subtle differences in image output, etc., as opposed to complexity of option specs. The current set has certainly been very useful in exposing problems I would not have realized were possible so I will definitely continue as I have been for now, but I expect there are more translation issues yet to be discovered.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.

rbdavis commented 2 years ago

—placement=inside:TR

OK, so you'd like to use --placement instead of --mapscale for -D (-L) in pswiggle (pscoast) ? I would say that maybe placement is a bit too general (i.e., placement of what?), but am happy to use whatever you prefer.