Closed cabo closed 2 months ago
If I disregard any chair hat and procedural considerations: Yes please, that brings things closer to CDDL; it's not like people expect C style concatenation, and it's not the most widely used/supported feature.
(I might even throw in a | or || into the concatenation operator pool, with a nod to cryptography people using it).
Marking the barewords as reserved for app literals sounds doable (especially since the floaty ones are mixed case and thus ineligible anyway; sadly we can't capitalize the others without losing JSON interoperability).
If you decide to make a PR out of that, I think I can crate a branch of my implementation that follows.
Then there is the aspect of roadmap -- sure this puts us back into the WGLC-required phase. I'd consider it worth it, but that's eventually for the ADs (for the push back into the WG) and the WG to decide.
Before going all-space-is-space, is there any merit to the middle ground where we do introduce a concatenation operator. That way, a grammar update could still later make the commas optional without hitting this particular obstacle, and non-validating consumers can use a comma-free grammar.
As long as we do have mandatory commas, this also simplifies implementations that handle comments, because rather than having (for some t
) S t S t S
and S t S "," S t S
chains we only have S t S delim S t S
style chains (with the delim
from [,+]
or whatever you pick as concatenation operator).
I'm not sure I understand your approach, but I have one observation: There needs to be a operator precedence between "," and "+" (or whatever character we use for that), at least if we want the AST to be useful (which helps implementers immensely).
I made a rough prototype of edn-abnf with explicit concatenation (and optional commas everywhere). You can see it in the edn-abnf PoC. Install the ec variant with:
gem install edn-abnf-ec
(ec stands for explicit concatenation).
You now can compare the output of edn-abnf-ec against that of (unchanged) edn-abnf
The five changes (four to allow optional comma (OC), one for ec) can be seen here:
https://github.com/cabo/edn-abnf/pull/1/files#diff-bc1c8602a
(There are some intermediate compilation results in the repo, these result from the changes in the actual attributed grammar source file .abnftt.)
I chose +
as a separator.
This has a slightly weird interaction with the leading "+" we allow in numbers:
$ echo "'a''b'+'c'+1'd'1(0)" | edn-abnf-ec -tdiag - | diag2diag.rb -et 'a', 'bc', 1, 'd', 1(0)
(You would normally write this with spaces to make it readable, like in the output; this would make 'a' + 'b'
stand out from +1
.)
I did not test this a lot yet. Against a corpus of examples in RFCs and I-Ds, I find:
The examples that show concatenation in the edn-literals draft of course no longer work. Note that concatenation is now also explicit for ellipses, so
"Herewith I buy" ... "gned: Alice & Bob"
needs to become
"Herewith I buy" + ... + "gned: Alice & Bob"
Wondering if we want to do something with the syntax of ellipses, but probably the above is good enough.
Noise from examples that were broken and now no longer are: :-)
Examples that just list multiple data items now work (parsed as a sequence) i-d/draft-ietf-core-href-13/cbor-diag/extended-cri-accommodating--b.cbor-diag
Same for some other examples that are missing commas i-d/draft-ietf-cose-merkle-tree-proofs-02/cbor-diag/consistency-proof-signature-c.cbor-diag
Noise from examples that already are broken and stay so:
examples that try to have single slashes in in-line comments break with different error messages rfc/rfc9173/cbor-diag/example-1-bib-abstract-secu.cbor-diag rfc/rfc9173/cbor-diag/example-4-bib-abstract-secu.cbor-diag
same with non-EDN in examples labeled as cbor-diag rfc/rfc9200/cbor-diag/as-request-creation-hints-p.cbor-diag rfc/rfc9202/cbor-diag/access-token-response-examp.cbor-diag
same for EDN of a map just missing outer braces rfc/rfc9202/cbor-diag/access-token-without-keying.cbor-diag
Here are some common string concatenation operators in various programming languages:
Note that these operators might not be as widely used as the + (plus) operator for string concatenation, but they are still valid and commonly used in their respective languages.
Please vote now ;-)
I must have stepped out for this part...
I don't love this:
"Herewith I buy" + ... + "gned: Alice & Bob"
I'm not sure if its possible, but could ...
have implicit concatenation for generic partials defined?
For example
"Herewith I buy" ... "gned: Alice & Bob"
implicit string concatenation with a string elision.[ 0, 1 ... 8, 9]
implicit list concatenation with a list elision.{ 1 : 2, 3 : 4 ... 8 : 9 }
, implicit map concatenation with a map key value elision.But then explicit concatenation for none elided instances?
"My name" + "is Alice"
== "My name is Alice"
[ 0, 1 ] + [ 2, 3 ]
== [ 0, 1, 2, 3 ]
{ 1 : 2, 3 : 4 } + { 8 : 9 }
== { 1 : 2, 3 : 4, 8 : 9 }
I can imagine choosing ..
as concatenation operator would be a nightmare given the ...
is used for elision, but just for fun:
"Herewith I buy" .. ... .. "gned: Alice & Bob"
"Herewith I buy" ... "gned: Alice & Bob"
implicit string concatenation with a string elision.[ 0, 1 ... 8, 9]
implicit list concatenation with a list elision.
We could give "..." some additional syntactic sugar. The array example already works, anyway:
$ echo '[ 0, 1 ... 8, 9]' | edn-abnf-ec -tdiag -
[0, 1, 888(null), 8, 9]
However, mixing the syntaxes gets complicated quickly, e.g., with
["a", "f" ... "m", "q"]
$ echo '[ "a", "f" ... "m", "q"]' | edn-abnf-ec -tdiag -
["a", "f", 888(null), "m", "q"]
Today, the ellipsis attaches to the string:
$ echo '[ "a", "f" ... "m", "q"]' | edn-abnf -tdiag -
["a", 888(["f", 888(null), "m"]), "q"]
Which of these is "right"?
I'm biased by my experience with the "spread operator" ... https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax
And its support for typescript partials ... https://www.executeprogram.com/courses/everyday-typescript/lessons/partial-in-practice
I'm biased by my experience with the "spread operator"
Right, Ruby has had these for a while as *
(arrays and positional parameter lists) and **
(hashes and keyword parameter lists), and of course Scheme has had unquote-splicing ,@
since the dark ages.
But these always have a variable-like thing that provides (or receives) the spread.
Maybe we need to separate splicing ellipses from free-standing ones, just like Scheme does.
But then, neither unquote (,
in Scheme) nor unquote-splicing (,@
) attach to neighboring syntactic features.
And we don't really want to say how this resolves.
To me, this is mainly about preserving the beauty of "Herewith I buy" ... "gned: Alice & Bob"; this doesn't really generalize (or already does, as with arrays and to a limited extent maps).
One data point: Section 3.5 of RFC9529 pioneered the comma-free version of EDN in the line that says 5c47bf16df96660a41298cb4307f7eb6' /x/
and is followed by the y coordinate without any comma ;-)
(Holding off on reporting that as an erratum there because while the then-current version of EDN had commas, it was not really formal)
@chrysn this issue seems to be about string concatenation, but your comment is about optional commas.
Is there some implication or interaction you are suggesting? I don't follow.
Edit: your point is obvious, now that I have had a single sip of coffee.
Making commas optional is the motivating driver for doing string concatenation different (Carsten pointed this out in the top-most item): As long as we have implicit concatenation, commas can't be made optional.
My impression of this issue is that if we really go that way this late in the process, then the commas would become optional in a second change in the same PR.
"Herewith I buy" ... "gned: Alice & Bob"
implicit string concatenation with a string elision.[ 0, 1 ... 8, 9]
implicit list concatenation with a list elision.We could give "..." some additional syntactic sugar. The array example already works, anyway:
I don't think we want to be making the rules for elision more complicated. Another option is to make elision only work inside h''
and b64''
.
"Herewith I buy" + h'...' + "gned: Alice & Bob"
Also, I want to point out that "..." is a map key for selective disclosure JWTs and possibly for selective disclosure CWTs as well. That could make misreading really ugly.
My impression of this issue is that if we really go that way this late in the process, then the commas would become optional in a second change in the same PR.
Well, the change is near trivial.
Of the five small changes in https://github.com/cabo/edn-abnf/pull/1/files#diff-bc1c860
(The attributed grammar in edngrammar.abnftt needs one more change, which is about picking up the right subtree for AST building after inserting elements that need to be counted -- my abnftt grammar does not currently have labels.)
When we discussed the general use of EDN for human input, one desire that came up was to get rid of required commas, maybe the way CDDL does.
This is generally doable (not without a little pain). However, it is incompatible with RFC 8610 G.4 (concatenated strings).
Can we get rid of that feature? (It is not implemented in cbor.me, so I'm too biased to answer this.) Of course, we'd do this to make commas optional right away as well.
If yes, what do we put in instead?
"abcd" + "efgh"
maybe? (Or any other recognizable "cat" operator.)Any other surgery needed?
foo 'bar'
might become distinct fromfoo'bar'
at least for the barewords we support:Or maybe we just actually reserve those and make them unavailable for
app-prefix
.Ah, the tree of temptation...