com-lihaoyi / upickle

uPickle: a simple, fast, dependency-free JSON & Binary (MessagePack) serialization library for Scala
https://com-lihaoyi.github.io/upickle
MIT License
721 stars 165 forks source link

Add `sortKeys: Boolean` flag to ujson/upickle APIs #553

Closed lihaoyi closed 9 months ago

lihaoyi commented 9 months ago

This PR allows anyone writing JSON to pass in sortKeys = true to make the object keys get output in sorted order.

Major changes:

  1. Introduced a new upickle.core.BufferedValue type. This is a sealed trait hierarchy that can be used to capture all the method calls received by a upickle.core.Visitor, and re-play them later. It replaces IndexedValue, which has JSON-specific idiosyncracies (e.g. converting MsgPack visitor calls to their JSON equivalents) whereas BufferedValue precisely captures the Visitor method calls unchanged. IndexedValue is left for binary compatibility but no longer used internally, even its ordering use case of buffering up things until a $type tag is found has been replaced by BufferedValue

  2. As part of the generalization from IndexedValue to BufferedValue, we now need to handle non-JSON usage patterns such as non-string-keyed objects (e.g. found in msgpack). This makes sorting these objects non-trivial, as their keys can now be arbitrary BufferedValues. BufferedValue.valueToSortKey is a rough hack to traverse any structured keys and make them sortable.

  3. Intercept all transform calls in the primary upickle/ujson APIs with BufferedValue.maybeSortKeysTransform. This checks the new sortKeys boolean flag, and if true it intercepts the visitor calls, captures them in a BufferedValue, recursively traverses and sorts the BufferedValue's object keys, then re-plays it back out

  4. Binary compatibility stubs have been added to all methods who received a new sortKeys: Boolean = true parameter. These stubs were not marked as @Deprecated, because in some cases the Scala compiler would preferentially resolve the stub over the method-with-default-value, causing spurious deprecation warnings.

Testing

  1. Sorting functionality is covered by new sortKeys unit tests in uJson and uPickle, and has been added to the TestUtil.scala round-trip tests to ensure that all existing tests exercise the a subset of the sortKeys = true codepaths for JSON and MsgPack and verify that the round-tripping is still successful

Notes

  1. It took a surprising amount of code to make this work well, as it goes against uPickle's "streaming-no-intermediate-data-structures" way of doing things. sortKeys = true has to buffer up some kind of intermediate JSON AST somewhere. I would expect performance to be worse than the default zero-intermediate-datastructures visitor approach, and so optimizing the sortKeys = true code paths was not a priority in this PR. If anyone wants maximum performance, they can leave it as the default sortKeys = false

  2. The whole Transformer/transform abstraction in the codebase is kind of half-baked and unsatisfactory, but I left it mostly as is for now with only small tweaks to make things work (e.g. allowing the implementation of BufferedValue.maybeSortKeysTransform to live in upickle.core). Realistically, the whole name Transformer and the concept it represents probably needs to be overhauled, but that's out of scope for this PR

  3. TestUtil.scala is getting big and messy, and I'm adding to the mess. It's probably OK to leave for now, but it could definitely use a cleanup

julienrf commented 9 months ago

Hey @lihaoyi, thank you for improving upickle. However such a change should lead to a new minor release as recommended here.

This is unfortunate because your library claims to follow the versioning scheme semver-spec but in practice it does not follow it, which could lead to bad surprises.

lihaoyi commented 9 months ago

This change is meant to be source and binary compatible. Your link states that such a change should be a patch version

@ckipp01 has raised some issues with our mima config that caused some breakage to slip through. That's a seprate issue: it's just a bug and I will fix it and cut a new issue

julienrf commented 9 months ago

In general, adding a new method can break the source compatibility. However, since in your case you added an overload maybe the situation is different, I am not 100% sure.

lefou commented 9 months ago

sortkeys is new API, hence we need to release it as a new minor version. The binary compatibility shims added here are just technical details. But if they would be not there, we would need to bump the major version, as it results in some API removals.

In a patch release, no API changes are allowed, only internal changes.