cbor-wg / edn-literal

Application-oriented literals for CBOR extended diagnostic notation
Other
0 stars 7 forks source link

Encoding indicators #11

Closed chrysn closed 10 months ago

chrysn commented 1 year ago

My mail at https://mailarchive.ietf.org/arch/msg/cbor/x9xl2lqqSNBK_wtApzo6H6ak8N4 got a bit lost, moving it here to keep track.

Rephrasing what is in there:

Actions which I think would be good are:

Looking at the details of 8949 encoding indicators, I also found that chunked strings can be expressed by using prefixed strings on every single chunk. Does that capability stay limited to the pseudo-EDN-literals h/b32/h32/b64/base64url, or can new literals go in there if they expand to a string? (That question is not actually new ... was (_ <<1>>, <<2, 3>>) a valid way to write (_ h'01', h'0203') aka 5F4101420203FF). The most straightforward way here is probably to just allow the pseudo-EDN-literals there and be done with it; it's not like we can't still allow it later (it still isn't an interchange format).

cabo commented 11 months ago

On the "streamstrings": I already updated the byte string branch to "bstr" in –04 (was sqstr, and that is too narrow). This syntactically includes app-prefix constructs, but not "embedded" -- the easiest fix would be to add a third alternative "embedded" to bstr.

We don't have much implementation experience with encoding indicators...

cabo commented 11 months ago

Uh oh. I hadn't looked at encoding indicators much for about a decade.

chrysn commented 11 months ago

If "it's too broken, it needs to go" is your conclusion, I'll be a bit sad, but with me lacking sufficient time to provide fixes right now, that may be an outcome.

cabo commented 11 months ago

The "_" can be followed by any \w -- we could simply pull a new convention for 1+0 out of our hats, say, "__" or "_1plus0" :-)

cabo commented 11 months ago

We need examples for tagged — 1(_3 5)? 1_3(5)? 1(5)_3? Fortunately not for simple() -- that is deterministic

chrysn commented 11 months ago

For tagged, the cbor-diag crate interprets it the existing text to lead to 32_0("https://cbor.nemo157.com")

cabo commented 11 months ago

I copied that and completed the set (e.g., <<1>>_0) in PR #15

cabo commented 11 months ago

PR #15 is complete on the ABNF side, with considerable latitude given in the ABNF to what values the *wordchar in spec can take. I'd like to merge this first, and then:

* We also should define something like a tag 999 for unimplemented application-extensions, as in 999(["dt", "4711"]), as proposed in #13. (Now PR #16)

I don't think we want to have extensive text about round-tripping, but

chrysn commented 11 months ago

I think that that'd be a good "preferred way". Note that for indefinite length encoding, as it's never preferred, it'd mean that it's always rendered explicitly (as (_ 'foo' 'bar') or [_ ...] etc), and that's good. (For <<>> parts and application literals that contain them and have no described structure for their insides, that may mean that they are not used unless the cbor-to-diag tool is configured to ignore those lengths.)

On that additional notation, being ~time is a good indicator (I don't suppose we want DT"" to mean that it carries a tag too). For b64 it could be a ~'d tag 21, but I don't know where the CDDL for it would be best described.