Closed chrysn closed 10 months ago
On the "streamstrings": I already updated the byte string branch to "bstr" in –04 (was sqstr, and that is too narrow). This syntactically includes app-prefix constructs, but not "embedded" -- the easiest fix would be to add a third alternative "embedded" to bstr.
We don't have much implementation experience with encoding indicators...
Uh oh. I hadn't looked at encoding indicators much for about a decade.
[_0 1]
, not [1]_0
.If "it's too broken, it needs to go" is your conclusion, I'll be a bit sad, but with me lacking sufficient time to provide fixes right now, that may be an outcome.
The "_" can be followed by any \w -- we could simply pull a new convention for 1+0 out of our hats, say, "__" or "_1plus0" :-)
We need examples for tagged — 1(_3 5)? 1_3(5)? 1(5)_3? Fortunately not for simple() -- that is deterministic
For tagged, the cbor-diag crate interprets it the existing text to lead to 32_0("https://cbor.nemo157.com")
I copied that and completed the set (e.g., <<1>>_0) in PR #15
PR #15 is complete on the ABNF side, with considerable latitude given in the ABNF to what values the *wordchar
in spec
can take. I'd like to merge this first, and then:
*wordchar
in spec
can take (empty string for indefinite on array and map, 0 to 3 and the new value, second _
??, for ai=0..23) and...* We also should define something like a tag 999 for unimplemented application-extensions, as in 999(["dt", "4711"]), as proposed in #13. (Now PR #16)
I don't think we want to have extensive text about round-tripping, but
we could mention that additional information is required to create non-basic diagnostic notation (e.g., b64 or application-extensions) is basing that on additional information. If CDDL is used for that, ~time
does this for dt''; but how to decide '' vs h'' vs b64''?
We could also mention that the preferred [sic] way of implementing encoding indicators in cbor-to-diag is to put in encoding indicators only where the encoding is not already preferred encoding.
I think that that'd be a good "preferred way". Note that for indefinite length encoding, as it's never preferred, it'd mean that it's always rendered explicitly (as (_ 'foo' 'bar')
or [_ ...]
etc), and that's good. (For <<>> parts and application literals that contain them and have no described structure for their insides, that may mean that they are not used unless the cbor-to-diag tool is configured to ignore those lengths.)
On that additional notation, being ~time is a good indicator (I don't suppose we want DT"" to mean that it carries a tag too). For b64 it could be a ~'d tag 21, but I don't know where the CDDL for it would be best described.
My mail at https://mailarchive.ietf.org/arch/msg/cbor/x9xl2lqqSNBK_wtApzo6H6ak8N4 got a bit lost, moving it here to keep track.
Rephrasing what is in there:
<>
embedded CBOR) allows round-tripping from CBOR to DN back to CBOR, even when the CBOR is not ideally (which is also deterministically) encoded, provided the CBOR->DN conversion annotates the explicit encoding indicators (at least where it's not using the size the encoder would use).Actions which I think would be good are:
Looking at the details of 8949 encoding indicators, I also found that chunked strings can be expressed by using prefixed strings on every single chunk. Does that capability stay limited to the pseudo-EDN-literals h/b32/h32/b64/base64url, or can new literals go in there if they expand to a string? (That question is not actually new ... was
(_ <<1>>, <<2, 3>>)
a valid way to write(_ h'01', h'0203')
aka5F4101420203FF
). The most straightforward way here is probably to just allow the pseudo-EDN-literals there and be done with it; it's not like we can't still allow it later (it still isn't an interchange format).