udunits2 grammar doesn't reflect the implementation

I'm working on the udunits2 grammar for a situation where I'd like to produce LaTeX representation of an un-interpreted udunits2 valid string (ref). To be clear, I do mean un-interpreted here - km km-1 and km/km should both produce something like \frac{km, km}, which I believe rules out using the actual ut_parse parser (happy to hear otherwise!).

I've found a number of cases with the documented grammar that should fail to produce a successful ut_unit. In most cases the behaviour of udunits-2 is the correct thing, and the documented grammar is just wrong.

Cases of incorrect grammar identified:

~Shift spec words must have leading spaces. For example m from2 is valid, but mfrom2 is not, yet m@2 is fine.~
```
 <shift_op>: one of
        "@"
        "after"
        "from"
        "since"
        "ref"
```
~should be~
```
 <shift_op>: one of
         "@"
         " after"
         " from"
         " since"
         " ref"
```
~(same is true for per and PER).~ EDIT: I was wrong about this. I got my identifiers wrong.

The grammar states that "ISO-8859-1 alphabetic characters" may be part of <id> (via <alpha>), but it isn't clear that other characters may also work (e.g. π) (I think I'm right in saying that π isn't in ISO-8859-1, but unicode has never been my strong suit).
CLOCK is documented as <hour> ":" <minute> (":" <second>)? but it looks like it is really <hour> (":" <minute> (":" <second>)?)?. (Does this happen because of the packed_clock format?
There is no mention of the special cases of UTC, Z and GMT for the case DATE CLOCK ID seen in https://github.com/Unidata/UDUNITS-2/blob/v2.2.27.6/lib/parser.y#L447-L451.
TIMSTAMP -> TIMESTAMP (typo)

Cases that udunits might be doing the wrong thing:

It seems that ut_parse can't handle unicode exponents greater than 3 for non numeric values. m³ is fine but m⁴ is not. Interestingly, ut_format produces m⁴ for an input of m+4 (as expected). 2⁴ works just fine though (as does 2⁻⁴²).
~The grammar states that:~
```
<second>:
          (<minute>|60) (\.[0-9]*)?
```
~But I can't see that udunits is actually enforcing this:~
```
$ udunits2 -H 's since 1990-1-1 0:0:61' -W 's since 1990-1-1 0:0:0'
1 s since 1990-1-1 0:0:61 = -3593 (s since 1990-1-1 0:0:0)
x/(s since 1990-1-1 0:0:0) = (x/(s since 1990-1-1 0:0:61)) - 3594
```
~The same appears to be true for all other clamped timestamp components.~

UPDATE: It seems that s since 1990-1-1 0:0:62 is actually identified as s since 1990-1-1 0:0:06 +2(hours), which is definitely valid as part of the grammar (but is that the behaviour that was intended?)

ut_parse reads s since 199022T1 as s @ 19911003T010000.00000000 UTC (that's s @ 1991-10-03). Given the definition of <month> ("0"?[1-9]|1[0-2]) I was expecting this to be 1990-02-02, though to be honest I would have preferred it to fail.

I'm raising this issue as I will keep track of what I found here, and so that I can start the ball rolling with having a machine&human readable grammar that can be tested systematically (either here or upstream in a project like cf-units). My intention is to re-create a grammar based on the ANTRL specification - the choice is somewhat arbitrary, but ANTRL does allow a number of useful tools, including multi-language support (pretty useful for testing!) and debugging/visualisation of the grammar (the latter I've not yet gotten working on my machine though 😞). Naturally I'm aware of the Lex-Yacc content of the udunits-2 codebase, but have found very few tools other than bison for working with the format.

I hope you don't find this issue to be pernickety - that is definitely not my intention! My main question is: Do you support me updating the documented grammar to be a readable AND machine/testable ANTLR grammar (subject to readability, of course)?

Unidata / UDUNITS-2

udunits2 grammar doesn't reflect the implementation #81