Document divergences from the spec

Lucretiel commented 3 years ago

There are a few cases where we're making a conscious choice to divert from the KDL spec. These should be documented near the top. Currently this includes:

Many entities in KDL are defined in terms of code points (for instance, KDL identifiers are made up of "any code point except for ...". Rust strings and char are sequences of Unicode Scalar Values, rather than Code Points. A Scalar Value is a slight subset of a Code Point that just excludes low and high surrogates. In practice we don't expect this will cause any issues.
KDL calls for duplicate property keys to be last-key-wins, and other keys ignored. We instead will use the ordinary serde map handling for these cases (ie, next_key_seed will always return the next key, without any consideration for duplicates).We prefer the flexibility offered by this, since Deserialize types have the opportunity to define their own behavior when receiving duplicate keys. HashMap, for instance, uses the last-key-wins strategy, while structs with derive(Deserialize) will fail with an error on a duplicate key.

tbmreza commented 3 years ago

What is the main selling point of kaydle implementation? (or does it need one)

Lucretiel commented 3 years ago

serde is definitely the main selling point.

CAD97 commented 3 years ago

Many entities in KDL are defined in terms of code points ... Rust strings and char are sequences of Unicode Scalar Values

I argue that while the spec refers to "code points," the top-level requirement for the document to be UTF-8 encoded eliminates the possibility for surrogates to show up, as well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired. I think the only location surrogate code points may actually show up per the spec is in \u{...} escapes, and the requirement that "Strings MUST be represented as UTF-8 values" may also prevent codepoints which are not USV in that location as well.

Or IOW, I think this one may formally be a non-issue.

Lucretiel commented 3 years ago

well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired

Oh no kidding? This would actually be news to me, that's interesting.

Lucretiel commented 3 years ago

Thinking more about the duplicate property keys thing. While I like the flexibility offered by leaning into the serde model, I'm somewhat unhappy that this could cause intentionally valid KDL documents to be rejected (for instance, a configuration dumping tool could deliberately make use of the last-key-wins behavior). I've had an idea for how to implement this in the parse (by adding a lookahead to NodeProcessor::next_event), so probably I'll make it runtime configurable (since opting into the conforming behavior incurs a performance penalty due to the lookahead

Lucretiel / kaydle

Document divergences from the spec #8