Lucretiel / kaydle

An alternative implementation of Kat's Document Language, including serde integration
Mozilla Public License 2.0
74 stars 4 forks source link

Document divergences from the spec #8

Open Lucretiel opened 3 years ago

Lucretiel commented 3 years ago

There are a few cases where we're making a conscious choice to divert from the KDL spec. These should be documented near the top. Currently this includes:

tbmreza commented 3 years ago

What is the main selling point of kaydle implementation? (or does it need one)

Lucretiel commented 3 years ago

serde is definitely the main selling point.

CAD97 commented 3 years ago

Many entities in KDL are defined in terms of code points ... Rust strings and char are sequences of Unicode Scalar Values

See also https://github.com/kdl-org/kdl/issues/207

I argue that while the spec refers to "code points," the top-level requirement for the document to be UTF-8 encoded eliminates the possibility for surrogates to show up, as well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired. I think the only location surrogate code points may actually show up per the spec is in \u{...} escapes, and the requirement that "Strings MUST be represented as UTF-8 values" may also prevent codepoints which are not USV in that location as well.

Or IOW, I think this one may formally be a non-issue.

Lucretiel commented 3 years ago

well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired

Oh no kidding? This would actually be news to me, that's interesting.

Lucretiel commented 3 years ago

Thinking more about the duplicate property keys thing. While I like the flexibility offered by leaning into the serde model, I'm somewhat unhappy that this could cause intentionally valid KDL documents to be rejected (for instance, a configuration dumping tool could deliberately make use of the last-key-wins behavior). I've had an idea for how to implement this in the parse (by adding a lookahead to NodeProcessor::next_event), so probably I'll make it runtime configurable (since opting into the conforming behavior incurs a performance penalty due to the lookahead