decentralized-identity / keri

Key Event Receipt Infrastructure - the spec and implementation of the KERI protocol
Apache License 2.0
74 stars 21 forks source link

Ordered Mapping #7

Closed SmithSamuelM closed 4 years ago

SmithSamuelM commented 4 years ago

This issue was to explore the concept and illicit feedback. The resulting proposal has been drafted into kid0003.md and a new issue is opened for kid0003.md

Ordered Mappings

Design Goals

Extracted Data

Design Decisions

Ordering Choices

Fixed fields not viable because of crypto material may be of variable length

Mapping Ordering Choices

This proposal was originally based on the new alternative of ordered mapping support in Javascript ES6 for the Map object. Unfortunately, after looking more closely, JSON.stringify does not transparently support serializing Map objects even with the replacerreviver options. These change the contents of the serialized mapping so its not interoperable. This proposal has been revised to reflect the more limited choices. Instead the ownPropertyKeys method does preserve property creation order and may be used with replacer method to correctly serialize Javascript objects in property creation order. This needs to be verified.

1) Sorted Label Mapping enforces lexicographic ordering in JSON. Ordering is lexicographic not logical. . Must play games with names to ensure nature logical order. example version string to be first have to " _vs " May not use ordering semantically. Order of appearance or missing elements may not carry meaning. Use json.stringify with replacer to sort lexicographically

2) Sorted by creation order using replacer function and ownPropertyKeys method in Javascript Ordering is logical not lexicographic. Do not have to play games with version string label "vs" ok. May use ordering semantically. Use json.stringify with replacer to sort by ownPropertyKeys

Proposal

The proposal is two fold: A) Impose an ordered mapping constraint on KERI event serializations to ensure the first field element is the version string to enable easier discovery. For Javascript this may be accomplished by applying two options to the JSON.stringify method. These are:
1- sorting the fields lexicographically to ensure version string element is first with modified version string label "_vs" that ensures first lexicographically. 2- sorting the fields logically (creation order) to ensure version string element is first. Assuming Javascript replacer using ownPropertyKeys works with unmodified "vs" label.

B) Extracted subsets of KERI events. 1- Use mappings sorted logically assuming Javascript ownPropertyKeys works. 2- Use list of lists of (label, value) pairs to ensure consistent ordering of extracted data sets 3- Use list of lists of values to ensure consistent ordering of extracted data sets 4- Use concatenated values from lists of of lists of (label, value pairs) or list of values

The options in B are sorted in order of decreasing preference.

because extraction sets are not sent over the wire they do not impinge on normal JSON rest tooling.

Discussion:

One of the complications of serialization of mappings is whether or not the elements of the mapping appear in any given order. The best ordering is insertion order. This allows logical ordering independent of lexicographic ordering of the labels. This also shows up as the canonicalization order problem. For small well defined data structures like KERI events imposing an ordering is straightforward. The complication is trading off ordering for programmer convenience or ease of understanding. Data structures that support labeled fields ease understanding for programmers. A universally applicable data structure that imposes ordering but still provides self-describing labels of the elements is a list of lists of (label, value) pairs. This is not quite as programmer friendly as a mapping with {label: value} pairs. Unfortunately, historically mapping were implemented with hash tables that had no predictable ordering. Python dicts since 3.6 are ordered. Ruby objects are hash table based have have been ordered since 2007. As far as I can tell the rust serde library allows ordered mapping to/from JSON serializations so its not a problem in rust. I don't know about other languages like Go or Java. I anticipate that because their objects are more structured (not hash table based object like Python and JavaScript). Python has long had support for ordered dict objects and the various python json parsing libraries preserve that ordering by default. Since Python 3.6 all dict are ordered by default so its comes for free. In binary languages like Rust data structures are explicitly laid out so we do not have the hash table problem of un-predictable order that objects in Javascript and until recently dicts in Python had.

Using ordered event mappings at least for first value helps with serialization discovery.
If logical ordering is possible its also more user friendly than lexicographic ordering.

Automatic determination of the serialization encoding of the event when parsing it, becomes trivialized when the serializations are guaranteed to preserve the order of at least the version string element. This means we can depend on the "vs" or "_vs" element being the first element serialized. So parsing the encoding means looking at the first few bytes to extract the version string which will always appear in an unambiguous location at the front of the serialization instead of some variable location in the serialization.

Ordered extraction set mappings is more user friendly than using lists of lists of (label, value) pairs. This would make the library a little more programmer friendly than using an ordered list of lists of (label, value) pairs. We must impose ordering for extracted sets of elements in order to reliably compute digests. This may be done universally by using list of lists of (label, value) pairs. This is the way the paper is written . But this is less programmer friendly than using ordered mappings that serialize and de-seralize in order. This is dependent on Javascript support for ordered serialization in logical (object property creation order via ownPropertyKeys method).

Ordered Javascript Objects using ownPropertyKeys Method

We may impose ordered serialization with the JSON.stringify method by using the replacer option and the [[ownPropertyKeys]] internal method added in Javascript ES6 exposed as Reflect.ownKeys(). This method is provided in JavaScript ES6 (2015). [[ownPropertyKeys]] uses the property creation order. At the time of ES6, JSON.stringify was not required to use the [[ownPropertyKeys]] iteration order (ie property creation order). This appears to have been fixed in later versions of ES. (See the discussion below). As of this writing nodejs v14.2.0 and the latest versions of Chrome, Safari, and Firefox all appear to preserve property creation order in JSON.stringify(). This means that we can depend on and use that ordering in the specification. For earlier versions that support ES6 then a custom replacer function for JSON.stringify(x, replacer) will ensure that property creation order is used.

Using Reflect.ownKeys() that implements [[ownPropertyKeys]]

I adapted the following nodejs code to ensure that we may enforce property creation order in JSON.stringify for ES6 or later versions of JavaScript.

"use strict"
const replacer = (key, value) =>
  value instanceof Object && !(value instanceof Array) ? 
    Reflect.ownKeys(value)
    .reduce((ordered, key) => {
      ordered[key] = value[key];
      return ordered 
    }, {}) :
    value;

// Usage

// JSON.stringify({c: 1, a: { d: 0, c: 1, e: {a: 0, 1: 4}}}, replacer);

var main = function() 
{
    console.log("Running Main.");

    let x = { zip: 3, apple: 1, bulk: 2, _dog: 4 };
    let y = JSON.stringify(x);
    console.log(y);
    console.log(Object.keys(x));
    console.log(Object.getOwnPropertyNames(x));
    console.log(Reflect.ownKeys(x));
    console.log(replacer);
    let z = JSON.stringify(x, replacer);
    console.log(z);
}

if (require.main === module) 
{
    main();
}

Console Output

Running Main.
{"zip":3,"apple":1,"bulk":2,"_dog":4}
[ 'zip', 'apple', 'bulk', '_dog' ]
[ 'zip', 'apple', 'bulk', '_dog' ]
[ 'zip', 'apple', 'bulk', '_dog' ]
[Function: replacer]
{"zip":3,"apple":1,"bulk":2,"_dog":4}

https://stackoverflow.com/questions/30076219/does-es6-introduce-a-well-defined-order-of-enumeration-for-object-properties/30919039

In ES6 the stringify order does not follow the ownPropertyKeys (property creation order) ordering but this will be fixed in ES 2020. ES11. Looks like Babel supports ES2020 already.

his question is about EcmaScript 2015 (ES6). But it should be noted that the EcmaScript2017 specification has removed the following paragraph that previously appeared in the specification for Object.keys, here quoted from the EcmaScript 2016 specification:

If an implementation defines a specific order of enumeration for the for-in statement, the same order must be used for the elements of the array returned in step 3.
Furthermore, the EcmaScript 2020 specification removes the following paragraph from the section on EnumerableOwnPropertyNames, which still appears in the EcmaScript 2019 specification:

Order the elements of properties so they are in the same relative order as would be produced by the Iterator that would be returned if the EnumerateObjectProperties internal method were invoked with O.
These removals mean that from EcmaScript 2020 onwards, Object.keys enforces the same specific order as Object.getOwnPropertyNames and Reflect.ownKeys, namely the one specified in OrdinaryOwnPropertyKeys. The order is:

Own properties that are array indexes,1 in ascending numeric index order
Other own String properties, in ascending chronological order of property creation
Own Symbol properties, in ascending chronological order of property creation
1 An array index is a String-valued property key that is a canonical numeric String2 whose numeric value i is an integer in the range +0 ≤ i < 232 - 1.

2 A canonical numeric String is a String representation of a Number that would be produced by ToString, or the string "-0". So for instance, "012" is not a canonical numeric String, but "12" is.

It should be noted that all major implementations had already aligned with this order years ago.

This would be similar to the following code snippet except instead of lexicographic sorting it uses the ownPropertyKeys as the order. The following was used as example code and modified above. https://stackoverflow.com/questions/16167581/sort-object-properties-and-json-stringify

https://gist.github.com/davidfurlong/463a83a33b70a3b6618e97ec9679e490

// Spec http://www.ecma-international.org/ecma-262/6.0/#sec-json.stringify
const replacer = (key, value) =>
    value instanceof Object && !(value instanceof Array) ? 
        Object.keys(value)
        .sort()
        .reduce((sorted, key) => {
            sorted[key] = value[key];
            return sorted 
        }, {}) :
        value;

// Usage

// JSON.stringify({c: 1, a: { d: 0, c: 1, e: {a: 0, 1: 4}}}, replacer);

JSON serialization ordered using ES6 Map Objects NOT VIABLE

Unfortunately although Javascript ES6 adds a new datatype called a Map that supports defined ordered elements that may be iterated in order and serialized/deserialized in order, the json.stringify does not transparently support serializing these objects in a way that both preservers ordering and also maintains interoperable json format. To clarify Map Objects in Javascript ES6 may be JSON encoded/decoded in an order preserving manner using the JSON.stringify and JSON.parse functions with replacer/reviver options but these change the format of the resultant json..

After looking more closely. JSON replacer/replacer reviver changes the contents of the mapping so its a different serialization this makes this not viable for interoperability.

Left here for future reference. https://stackoverflow.com/questions/29085197/how-do-you-json-stringify-an-es6-map

Both JSON.stringify and JSON.parse support a second argument. replacer and reviver respectively. With replacer and reviver below it's possible to add support for native Map object, including deeply nested values

function replacer(key, value) {
  const originalObject = this[key];
  if(originalObject instanceof Map) {
    return {
      dataType: 'Map',
      value: Array.from(originalObject.entries()), // or with spread: value: [...originalObject]
    };
  } else {
    return value;
  }
}
function reviver(key, value) {
  if(typeof value === 'object' && value !== null) {
    if (value.dataType === 'Map') {
      return new Map(value.value);
    }
  }
  return value;
}

Usage:

const originalValue = new Map([['a', 1]]);
const str = JSON.stringify(originalValue, replacer);
const newValue = JSON.parse(str, reviver);
console.log(originalValue, newValue);
Deep nesting with combination of Arrays, Objects and Maps

const originalValue = [
  new Map([['a', {
    b: {
      c: new Map([['d', 'text']])
    }
  }]])
];
const str = JSON.stringify(originalValue, replacer);
const newValue = JSON.parse(str, reviver);
console.log(originalValue, newValue);
OR13 commented 4 years ago

https://github.com/multiformats/multicodec https://github.com/multiformats/multibase

very useful for agility

OR13 commented 4 years ago

See also: https://en.wikipedia.org/wiki/Type-length-value

OR13 commented 4 years ago

And https://tools.ietf.org/id/draft-rundgren-json-canonicalization-scheme-06.html

SmithSamuelM commented 4 years ago

thanks OR13 Basically IPFS uses a canonicalization scheme that is essentially rundgren approach That is no whitespace only ascii field names and sorted fields.

I looked at length at Multi-Base and multi-codec. They are quite verbose relatively speaking and KERI only needs one base and one codec encoding. Same principle but more tuned implementation. KERIs code is really simple to implement. So the tradeoff made sense to me.

Both msgpack and CBOR use the Type Length Value encoding format and are about as compact as you can make them. So I see no reason to reinvent that wheel. The difference is that they do not support cypher suite so that is the only addition KERI provides.

SmithSamuelM commented 4 years ago

See KID003

SmithSamuelM commented 4 years ago

The derivation codes will be described in KID002.

One of the important design constraints for KERI is performance in data streaming applications. Multi-codec is meant to be universally general. Such as function code byte(s), base code byte(s,) hash size byte(s). A generally compliant implementation of Multi-codec is more verbose on average than KERI codes. KERI supports two formats for fully qualified cryptographic material. These are Base64 on 6 bit boundaries per character and Base2 on 8 bit boundaries per byte. Base64 is the most compact URL/File/Textual format. Base 2 with 8 bit boundaries is the most generally useful binary format. The coding is designed so that a base 2 stream with 8 bit boundaries and a base 64 stream with 6 bit boundaries are processable on perfect boundaries for both. The number of Base 64-6bit with 4 characters is 4/3 of the number of bytes with base2-8bit. Likewise the number of bytes with base 2-8bit is 3/4 the number of characters with Base 64-6bit . The base 64 standard adds pad bytes to ensure these perfect boundaries (base 64 is on 6 bit boundaries). KERI uses the base64 pad bytes opportunistically. In most cases, this means, zero overhead for its derivation code. Its worst case (for now) is 3 bytes (4 base 64 characters) which is comparable a Multi-Codec with function, base, and length bytes. Also there is no assurance of perfect boundaries with multi-codec between Base64-6bit and Base2-8 bit which means that additional pad bytes may be needed. Thus further makes the average case for multi-codec much worse than KERI. The average case for KERI is better by design. Performance optimization comes at the loss of generality.

Furthermore, anything that has perfect boundaries with divisor of base 2-8 bit or base 64-6 bit may also have perfect boundaries for both. This includes Hex (4 bit boundaries) and octal (3 bit boundaries). For example, 24 bits = 3x8bits = 4x6bits = 6X4bits =8X3bits. So this makes processing of binary streams possible without any extra padding etc. Eventually, as KERI becomes widely used, full binary implementations of KERI events will happen that will benefit from the ability to leverage cryptographic material streams that always align on perfect boundaries and result in significant performance optimizations.