Closed jkbbwr closed 2 years ago
Very few use cases are publicly known that would need canonical encoding, so little attention has been given to that in implementations. Can you talk about yours?
My primary use-case for canonical encoding is in content-addressed storage systems. For example, I want to encode some values into blocks, which are identified by the SHA2-256 of their byte content. If I serialize an equivalent value again in the future, I want it to map to the same hash identifier so that the data is deduplicated.
As listed above. SHA256 of mixed type maps.
The go library supports canonical: github.com/ugorji/go/codec
Thank you all. I haven't quite figured out how to add this information to this site. However, the IETF CBOR WG will soon embark on an implementation report in preparation for moving CBOR along on the Internet standards track, and that will contain information of this kind. So I'm leaving this as a future enhancement for now.
If you have any say can canonical formatting be part of the specification itself? Not everything has to support it but a canonical serialization format for hashing would be perfect.
The clj-cbor implementation supports canonical encoding, for what it's worth.
@jkbbwr Section 3.9. "Canonical CBOR" is part of the specification already. Maybe it gives a little too much leeway on e.g. floating point encoding. There also have been discussions on whether there should be a canonical encoding with a different sort order for map keys. All this points to having multiple "canonical" encodings, which may violate the POLS (principle of least surprise).
(For spec issues, please go to https://github.com/cbor-wg/CBORbis/issues -- we are doing another run now in the CBOR WG at clearing out editorial issues and other things that need to be done to get "Internet Standard" status.)
Another note on canonical mode that came up while I was implementing the spec: Clojure has sets (unordered collections of distinct elements) as a basic data type, but they don't appear in the CBOR spec. When encoded canonically, set elements should be treated the same way map keys are; that is, written in sorted order so that a set with the same elements is rendered the same each time. This turned out to be cleaner to do by promoting sets to a more first-class treatment than using tagged value handlers.
I think that minimally, the 'set' type should have a standard single-byte tag value (< 24) given the prevalence in language data types. It would also be good to specify the canonical treatment of certain tagged values like sets.
On Apr 14, 2017, at 17:29, Greg Look notifications@github.com wrote:
Another note on canonical mode that came up while I was implementing the spec: Clojure has sets (unordered collections of distinct elements) as a basic data type, but they don't appear in the CBOR spec. When encoded canonically, set elements should be treated the same way map keys are; that is, written in sorted order so that a set with the same elements is rendered the same each time. This turned out to be cleaner to do by promoting sets to a more first-class treatment than using tagged value handlers.
Right. We didn’t include sets in CBOR’s base set of containers because they aren’t in JSON either. Tagging an array as the representation of a set makes a lot of sense for those environments that have these as first-class containers.
I think that minimally, the 'set' type should have a standard single-byte tag value (< 24) given the prevalence in language data types.
Well, we have very few of those single-byte tags left, and since sets are typically large already, a two-byte tag sounds more appropriate.
It would also be good to specify the canonical treatment of certain tagged values like sets.
Yes… all tags would need to define their canonical representation — we didn’t open that can of worms in RFC 7049, but probably would have to for more extensive use of canonical.
Grüße, Carsten
It would be really handy to be able to know how to use and know which implementations support canonical encoding.