Closed SmithSamuelM closed 4 years ago
https://github.com/multiformats/multicodec https://github.com/multiformats/multibase
very useful for agility
thanks OR13 Basically IPFS uses a canonicalization scheme that is essentially rundgren approach That is no whitespace only ascii field names and sorted fields.
I looked at length at Multi-Base and multi-codec. They are quite verbose relatively speaking and KERI only needs one base and one codec encoding. Same principle but more tuned implementation. KERIs code is really simple to implement. So the tradeoff made sense to me.
Both msgpack and CBOR use the Type Length Value encoding format and are about as compact as you can make them. So I see no reason to reinvent that wheel. The difference is that they do not support cypher suite so that is the only addition KERI provides.
See KID003
The derivation codes will be described in KID002.
One of the important design constraints for KERI is performance in data streaming applications. Multi-codec is meant to be universally general. Such as function code byte(s), base code byte(s,) hash size byte(s). A generally compliant implementation of Multi-codec is more verbose on average than KERI codes. KERI supports two formats for fully qualified cryptographic material. These are Base64 on 6 bit boundaries per character and Base2 on 8 bit boundaries per byte. Base64 is the most compact URL/File/Textual format. Base 2 with 8 bit boundaries is the most generally useful binary format. The coding is designed so that a base 2 stream with 8 bit boundaries and a base 64 stream with 6 bit boundaries are processable on perfect boundaries for both. The number of Base 64-6bit with 4 characters is 4/3 of the number of bytes with base2-8bit. Likewise the number of bytes with base 2-8bit is 3/4 the number of characters with Base 64-6bit . The base 64 standard adds pad bytes to ensure these perfect boundaries (base 64 is on 6 bit boundaries). KERI uses the base64 pad bytes opportunistically. In most cases, this means, zero overhead for its derivation code. Its worst case (for now) is 3 bytes (4 base 64 characters) which is comparable a Multi-Codec with function, base, and length bytes. Also there is no assurance of perfect boundaries with multi-codec between Base64-6bit and Base2-8 bit which means that additional pad bytes may be needed. Thus further makes the average case for multi-codec much worse than KERI. The average case for KERI is better by design. Performance optimization comes at the loss of generality.
Furthermore, anything that has perfect boundaries with divisor of base 2-8 bit or base 64-6 bit may also have perfect boundaries for both. This includes Hex (4 bit boundaries) and octal (3 bit boundaries). For example, 24 bits = 3x8bits = 4x6bits = 6X4bits =8X3bits. So this makes processing of binary streams possible without any extra padding etc. Eventually, as KERI becomes widely used, full binary implementations of KERI events will happen that will benefit from the ability to leverage cryptographic material streams that always align on perfect boundaries and result in significant performance optimizations.
This issue was to explore the concept and illicit feedback. The resulting proposal has been drafted into kid0003.md and a new issue is opened for kid0003.md
Ordered Mappings
Design Goals
future proof crypto agility and event content crypto agility means variable length crypto material. Fixed fields problematic
reasonably compact to IoT, and data streaming applications msgpack and CBOR are more compact than json UDP and other transports besides TCP reasonable element labels
KERI is well defined limited application so may make the tradeoff of less generality for increased performance (i.e. not as general as IPFS may be more optimized)
reasonably web friendly. Supports HTTP/JSON but not HTTP only or JSON only
reasonably programmer friendly use labeled mappings (label, value) and arrays not just concatenated values if possible
events may be only signed by controller to ordering of labels within events per se is not strictly required
event serialization discovery is simplified if labels are ordered so that version string label is always first. Discovery better supports other transports besides http
extract data sets for digests must be reproducibly ordered. Not an option.
Extracted Data
Design Decisions
Events are mappings and serialized as mappings.
Base64 most compact but web friendly encoding for crypto material. Don't need to support multiple web friendly encodings. One is enough. no need for more generality such as multiple bases.
Base65 compatible derivation code. Supports crypto agility future proof. Most compact representation leveraging Base64 pad characters. KERI constrained context means only need to encode crypto suite not context. One is enough no need for more generality.
Ordering Choices
Fixed fields not viable because of crypto material may be of variable length
extract to concatenated element values most compact but least programmer friendly and future proof Order preserving
extract to lists of lists compact and more programmer friendly and future proof but still not very programmer friendly Order preserving
extract to list of lists (list of tuples) of (label, value) pairs less compact but much more programmer friendly all future proof encoding independent. Order preserving.
encode mapping without extraction least compact (although can use compact label names) but most programmer friendly and future proof. But not order preserving unless apply additional ordering rules.
Mapping Ordering Choices
This proposal was originally based on the new alternative of ordered mapping support in Javascript ES6 for the Map object. Unfortunately, after looking more closely, JSON.stringify does not transparently support serializing Map objects even with the replacerreviver options. These change the contents of the serialized mapping so its not interoperable. This proposal has been revised to reflect the more limited choices. Instead the
ownPropertyKeys
method does preserve property creation order and may be used with replacer method to correctly serialize Javascript objects in property creation order. This needs to be verified.1) Sorted Label Mapping enforces lexicographic ordering in JSON. Ordering is lexicographic not logical. . Must play games with names to ensure nature logical order. example version string to be first have to " _vs " May not use ordering semantically. Order of appearance or missing elements may not carry meaning. Use json.stringify with replacer to sort lexicographically
2) Sorted by creation order using replacer function and ownPropertyKeys method in Javascript Ordering is logical not lexicographic. Do not have to play games with version string label "vs" ok. May use ordering semantically. Use json.stringify with replacer to sort by ownPropertyKeys
Proposal
The proposal is two fold: A) Impose an ordered mapping constraint on KERI event serializations to ensure the first field element is the version string to enable easier discovery. For Javascript this may be accomplished by applying two options to the JSON.stringify method. These are:
1- sorting the fields lexicographically to ensure version string element is first with modified version string label "_vs" that ensures first lexicographically. 2- sorting the fields logically (creation order) to ensure version string element is first. Assuming Javascript replacer using ownPropertyKeys works with unmodified "vs" label.
B) Extracted subsets of KERI events. 1- Use mappings sorted logically assuming Javascript ownPropertyKeys works. 2- Use list of lists of (label, value) pairs to ensure consistent ordering of extracted data sets 3- Use list of lists of values to ensure consistent ordering of extracted data sets 4- Use concatenated values from lists of of lists of (label, value pairs) or list of values
The options in B are sorted in order of decreasing preference.
because extraction sets are not sent over the wire they do not impinge on normal JSON rest tooling.
Discussion:
One of the complications of serialization of mappings is whether or not the elements of the mapping appear in any given order. The best ordering is insertion order. This allows logical ordering independent of lexicographic ordering of the labels. This also shows up as the canonicalization order problem. For small well defined data structures like KERI events imposing an ordering is straightforward. The complication is trading off ordering for programmer convenience or ease of understanding. Data structures that support labeled fields ease understanding for programmers. A universally applicable data structure that imposes ordering but still provides self-describing labels of the elements is a list of lists of (label, value) pairs. This is not quite as programmer friendly as a mapping with {label: value} pairs. Unfortunately, historically mapping were implemented with hash tables that had no predictable ordering. Python dicts since 3.6 are ordered. Ruby objects are hash table based have have been ordered since 2007. As far as I can tell the rust serde library allows ordered mapping to/from JSON serializations so its not a problem in rust. I don't know about other languages like Go or Java. I anticipate that because their objects are more structured (not hash table based object like Python and JavaScript). Python has long had support for ordered dict objects and the various python json parsing libraries preserve that ordering by default. Since Python 3.6 all dict are ordered by default so its comes for free. In binary languages like Rust data structures are explicitly laid out so we do not have the hash table problem of un-predictable order that objects in Javascript and until recently dicts in Python had.
Using ordered event mappings at least for first value helps with serialization discovery.
If logical ordering is possible its also more user friendly than lexicographic ordering.
Automatic determination of the serialization encoding of the event when parsing it, becomes trivialized when the serializations are guaranteed to preserve the order of at least the version string element. This means we can depend on the "vs" or "_vs" element being the first element serialized. So parsing the encoding means looking at the first few bytes to extract the version string which will always appear in an unambiguous location at the front of the serialization instead of some variable location in the serialization.
Ordered extraction set mappings is more user friendly than using lists of lists of (label, value) pairs. This would make the library a little more programmer friendly than using an ordered list of lists of (label, value) pairs. We must impose ordering for extracted sets of elements in order to reliably compute digests. This may be done universally by using list of lists of (label, value) pairs. This is the way the paper is written . But this is less programmer friendly than using ordered mappings that serialize and de-seralize in order. This is dependent on Javascript support for ordered serialization in logical (object property creation order via ownPropertyKeys method).
Ordered Javascript Objects using ownPropertyKeys Method
We may impose ordered serialization with the JSON.stringify method by using the replacer option and the [[ownPropertyKeys]] internal method added in Javascript ES6 exposed as Reflect.ownKeys(). This method is provided in JavaScript ES6 (2015). [[ownPropertyKeys]] uses the property creation order. At the time of ES6, JSON.stringify was not required to use the [[ownPropertyKeys]] iteration order (ie property creation order). This appears to have been fixed in later versions of ES. (See the discussion below). As of this writing nodejs v14.2.0 and the latest versions of Chrome, Safari, and Firefox all appear to preserve property creation order in JSON.stringify(). This means that we can depend on and use that ordering in the specification. For earlier versions that support ES6 then a custom replacer function for JSON.stringify(x, replacer) will ensure that property creation order is used.
Using Reflect.ownKeys() that implements [[ownPropertyKeys]]
I adapted the following nodejs code to ensure that we may enforce property creation order in JSON.stringify for ES6 or later versions of JavaScript.
Console Output
https://stackoverflow.com/questions/30076219/does-es6-introduce-a-well-defined-order-of-enumeration-for-object-properties/30919039
In ES6 the stringify order does not follow the ownPropertyKeys (property creation order) ordering but this will be fixed in ES 2020. ES11. Looks like Babel supports ES2020 already.
This would be similar to the following code snippet except instead of lexicographic sorting it uses the ownPropertyKeys as the order. The following was used as example code and modified above. https://stackoverflow.com/questions/16167581/sort-object-properties-and-json-stringify
https://gist.github.com/davidfurlong/463a83a33b70a3b6618e97ec9679e490
JSON serialization ordered using ES6 Map Objects NOT VIABLE
Unfortunately although Javascript ES6 adds a new datatype called a Map that supports defined ordered elements that may be iterated in order and serialized/deserialized in order, the json.stringify does not transparently support serializing these objects in a way that both preservers ordering and also maintains interoperable json format. To clarify Map Objects in Javascript ES6 may be JSON encoded/decoded in an order preserving manner using the JSON.stringify and JSON.parse functions with replacer/reviver options but these change the format of the resultant json..
After looking more closely. JSON replacer/replacer reviver changes the contents of the mapping so its a different serialization this makes this not viable for interoperability.
Left here for future reference. https://stackoverflow.com/questions/29085197/how-do-you-json-stringify-an-es6-map
Both JSON.stringify and JSON.parse support a second argument. replacer and reviver respectively. With replacer and reviver below it's possible to add support for native Map object, including deeply nested values