Open Lucretiel opened 3 years ago
What is the main selling point of kaydle implementation? (or does it need one)
serde
is definitely the main selling point.
Many entities in KDL are defined in terms of code points ... Rust strings and char are sequences of Unicode Scalar Values
See also https://github.com/kdl-org/kdl/issues/207
I argue that while the spec refers to "code points," the top-level requirement for the document to be UTF-8 encoded eliminates the possibility for surrogates to show up, as well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired. I think the only location surrogate code points may actually show up per the spec is in \u{...}
escapes, and the requirement that "Strings MUST be represented as UTF-8 values" may also prevent codepoints which are not USV in that location as well.
Or IOW, I think this one may formally be a non-issue.
well-formed UTF-8 is an encoded sequence of USV and MUST NOT include surrogates, unpaired nor paired
Oh no kidding? This would actually be news to me, that's interesting.
Thinking more about the duplicate property keys thing. While I like the flexibility offered by leaning into the serde model, I'm somewhat unhappy that this could cause intentionally valid KDL documents to be rejected (for instance, a configuration dumping tool could deliberately make use of the last-key-wins behavior). I've had an idea for how to implement this in the parse (by adding a lookahead to NodeProcessor::next_event), so probably I'll make it runtime configurable (since opting into the conforming behavior incurs a performance penalty due to the lookahead
There are a few cases where we're making a conscious choice to divert from the KDL spec. These should be documented near the top. Currently this includes:
char
are sequences of Unicode Scalar Values, rather than Code Points. A Scalar Value is a slight subset of a Code Point that just excludes low and high surrogates. In practice we don't expect this will cause any issues.map
handling for these cases (ie,next_key_seed
will always return the next key, without any consideration for duplicates).We prefer the flexibility offered by this, sinceDeserialize
types have the opportunity to define their own behavior when receiving duplicate keys.HashMap
, for instance, uses the last-key-wins strategy, while structs withderive(Deserialize)
will fail with an error on a duplicate key.