Open PJK opened 7 years ago
cbor2diag.rb also loses the fact that integers are encoded with a non-minimal encoding. This comes at least partially from the fact that the cbor decoder represents integers and floating numbers as native Ruby types, rather than including their size information.
Indeed, cbor-diag is based on cbor-pure, which maps the CBOR data model to Ruby data structures. Adding meta information such as encoding sizes is not easy in the latter (I have a partial solution, but there are some Ruby bugs around floating point numbers). Maybe we need to go away from actual Ruby data here and treat integers and primitives the same way we already treat unknown simples.
So far, I haven't needed these capabilities -- what is your use case that requires full roundtripping?
In https://github.com/dimich-g/webpackage/pull/50/files#diff-04c6e90faac2675aa89e2176d2eec7d8R128, I'm requiring that the trailing length be encoded in 8 bytes, so that tools parsing from the end of the file can tell whether h'1b0000001a00191817' encodes 111670794263, 1644567, 6167, or 23. I'd like the diagnostic format to be able to diagnose mistakes in that encoding and to produce that encoding.
I see. You would get the same effect from using any CBOR encoder, including those that don't give you control over encoding lengths, by just describing that field as a byte string of eight bytes and then just saying at the semantic level that this is a length encoded in network byte order. Small cognitive dissonance, much better leverage of the existing CBOR ecosystem.
Yeah, that's fair. Another use, which I don't need yet, is for cases that require Canonical CBOR. If an item fails to parse due to being non-canonical, the diagnostic form should make it possible to distinguish that item from its canonical form.
The above thread notwithstanding, it should indeed be possible to use this gem to get more representation information. I will try to extract some code from the cddl
gem to use here, but it will probably take a week or two until we have something to play with.
Hi, I came across the following inconsistency:
cbor2diag
ignores width of floating-point numbers and thus is not bijective, as one would expect.The RFC specifies that the width should be encoded using
_n
suffix, but it's not, therefore the information is lost, as demonstrated by the following example:perl -e' print "\xfb\x00\x00\x00\x00\x00\x00\x00\x00"' | cbor2diag.rb | diag2cbor.rb | hexdump
prints
instead of
\xfb\x00\x00\x00\x00\x00\x00\x00\x00
.Moreover, the parser doesn't seem to support parsing the
_
width suffixes at all:This in general goes back to the design of Diagnostic Notation which doesn't specify the exact rules for encoding 'logical' values to CBOR items. The same problem occurs with e.g. lengths of all items, where this implementation also chooses the 'canonical' encoding over preserving the full information:
If these are the intended semantics of DN, then the standard should be amended with unambiguous specification of 'Canonical' CBOR, which would render section 6.1 unnecessary.
Otherwise the implementation should be changed to at least allow parsing all valid DN inputs. In this case,
cbor2diag
'composed' withdiag2cbor
can never be an identity unless DN is extended to allow specifying width in all constructs where multiple choices produce valid and logically equivalent item.