cbor-wg / edn-literal

Application-oriented literals for CBOR extended diagnostic notation
Other
0 stars 7 forks source link

alternate single-pass ABNF #49

Open rohanmahy opened 2 months ago

rohanmahy commented 2 months ago

Hi, I believe this PR contains the changes that would be needed to make a single ABNF for EDN. A future app-string that had no requirement for interior comments could just define its ABNF (with any unquoted characters from unescaped and DQUOTE), and append the app-string to the known-app-str production. A future app-string that wanted comments could add safely either slash-comment or hash-comment to its whitespace as long as "/" or "#" respectively did not have meaning in the interior grammar.

Hope this helps. Thanks, -rohan

OR13 commented 2 months ago

I tend to prefer a single ABNF, and clear guidance and examples for extensions.

I don't fully follow the comments regarding extensions to support comments, is the suggestion that if a single ABNF is provided then it must not define support for comments and leave that to extension? I don't like that.

If comments are just used as an example here, can we use a new app specific literal as an example instead?

rohanmahy commented 2 months ago

I tend to prefer a single ABNF, and clear guidance and examples for extensions.

Me too. I'm sorry that I didn't provide more text about what that guidance would be. Basically, someone writing a new app-string, writes the grammar using as many of the productions in the base/existing ABNF as possible. If they allow whitespace and want both slash and hash comment, insert the S production (see the example below for e). If they want hash comments only (because / is meaningful inside), insert the B production (see the example below for ip).

Regardless if comments are used inside the app-string, the new app-string definition is added to the end of the known-app-str production.

That's it.

I don't fully follow the comments regarding extensions to support comments, is the suggestion that if a single ABNF is provided then it must not define support for comments and leave that to extension? I don't like that.

Comments are supposed to be treated like whitespace in EDN. Without this PR, the four app strings defined in this draft have the following characteristics:

While I see no reason to add comments inside an ip address, If you were willing to make the application ignorine all whitespace and wanted to extend ip to support # comments, you could add the B production into the lowest levels of the BNF (h16, colon, double colon, decoctet, uint/prefix). for example:

h16 = 1*4(HEXDIG / B)

Including https://www.ietf.org/archive/id/draft-ietf-cbor-edn-e-ref-00.html , there is no formal grammar defined, but the intent of e'' seems to be to allow anything that is valid as the id production in CDDL; and the intent of ref'' seems to be to allow anything compatible with the generic URI syntax and literal filenames. Literal filenames don't have fungible whitespace, often contain slashes and could contain arbitrary characters such as #. URIs don't allow any whitespace and the / and # characters are used in their BNFs. The id production allows [A-Za-z0-9@_$.-]. If the author wanted to allow comments and whitespace anywhere in e'', he could add the S production inside the BNF of enum:

; app-string e
e              = (%s"e" / %s"E") SQUOTE enum SQUOTE
enum       = S EALPHA *(*("-" / ".") (EALPHA / DIGIT/ S))
EALPHA  = ALPHA / "@" / "_" / "$"

Personally, I think the fact that the current grammars for ip and dt don't allow comments, and none of the examples for e or ref have comments is a strong indicator that nobody is going to bother incorporating comments in future app strings, largely because every new app string so far doesn't allow whitespace.

If comments are just used as an example here, can we use a new app specific literal as an example instead?

I'm not sure what you mean @OR13 . Hopefully I answered your questions somewhere else in this reply.

OR13 commented 2 months ago

I meant that as the working group compares the approaches, an example that is complex enough to highlight differences might be valuable.

Also, all these comments are as an individual. I'd like to hear from other implementers,.

hildjj commented 2 weeks ago

This currently has the ellipsis rule defined twice, and the inside-h rule has "ellipsis" instead of ellipsis.

rohanmahy commented 2 weeks ago

This currently has the ellipsis rule defined twice, and the inside-h rule has "ellipsis" instead of ellipsis.

Thanks Joe!

hildjj commented 2 weeks ago

OK, here are some more issues then:

With these changes, I was able to at least parse this.

rohanmahy commented 2 weeks ago

OK, here are some more issues then:

* `uint` defined more than once

* `S` defined more than once

* Rule names according to RFC 5234 are `ALPHA *(ALPHA / DIGIT / "-")`, which means b64_4 (etc) are invalid.

* There are a bunch of \xa0 non-breaking spaces scattered around, I bet unintentionally

* `SQOUTE` is a typo

* `b64dig` is not defined

* `ALPHA` is not defined

With these changes, I was able to at least parse this.

OK, all fixed now. Not sure how the \xA0 got in there. :-\

hildjj commented 2 weeks ago

There's also going to need a bunch of rework in the comment rules used in inside-*. None of them should accept an SQUOTE, I think. h'12 /'/ 34' shouldn't work, for example.

hildjj commented 2 weeks ago

Note that 'one-item' is an alternate entry point, so it's not unused.

rohanmahy commented 2 weeks ago

There's also going to need a bunch of rework in the comment rules used in inside-*. None of them should accept an SQUOTE, I think. h'12 /'/ 34' shouldn't work, for example.

By my read, it does not allow /'/, since safe-unescaped specifically excludes <'> unless it is backslash escaped (see slash-escaped and hash-escaped). So h'12 / wasn\'t a problem / 34' is OK.

rohanmahy commented 2 weeks ago

As an existence proof that extending a single-layer ABNF is straightforward, edn-e-ref issue #1 shows an ABNF for e'' and ref'' that works with the ABNF in this PR. The changes specifically needed to make this work in a single layer were:

The bulk of the work was fixing removing duplicate or unused productions.