Closed lihaoyi closed 1 year ago
Review by @lefou @lolgab @plokhotnyuk if anyone wants to have a look through before I merge it.
It's a bit of a messy PR, but a lot of it is pretty boilerplatey code, so hopefully not too dense. Scala 3's inline def
s would help a bit to alleviate the boilerplate a bit, some day in the far future when we drop Scala 2 support.
The test suite is green, the MIMA bin-compat checks pass, and the benchmarks seem to indicate good improvements.
A good improvement would be switching to JMH for benchmarking. It will prevent mistakes in measurements when following best practices from JMH samples and will give a good tool for measurement of secondary metrics like other CPU events or JVM allocations (using built-in perfnorm
and gc
profilers).
@lihaoyi Do you consider an integration of visitor API to work with jsoniter-scala-core? I'm ready to add a missing parsing or serialization methods in jsoniter-scala-core API as it was done recently for hasRemaining
.
Yeah using JMH would be nice. The current benchmark suite is pretty old.
I haven't thought seriously about integrating with jsoniter. I've poked around (e.g. when looking for a double serializer to use), but it's a pretty intimidating codebase so i didn't get as far as figuring out where the possible integration points are
Yeah using JMH would be nice. The current benchmark suite is pretty old.
As a workaround, for the first time you can publish locally a snapshot version and use JMH benchmarks from
jsoniter-scala-benchmarkJS
andjsoniter-scala-benchmarkJVM
sub-projects.I haven't thought seriously about integrating with jsoniter. I've poked around (e.g. when looking for a double serializer to use), but it's a pretty intimidating codebase so i didn't get as far as figuring out where the possible integration points are.
Here is an attempt to do that in weePickle's codebase. Probably with cooperation from both sides, we can achieve quite competitive solutions like those jsoniter-scala-circe and play-json-jsoniter boosters.
@plokhotnyuk what's the story for JMH and Scala.js/Scala-Native? IIRC one of the reasons I had my own benchmarking code in Scala is so I can easily re-use the benchmarks across platforms. Is JMH only for Scala-JVM? Or is there some cross-platform support too?
@plokhotnyuk what's the story for JMH and Scala.js/Scala-Native? IIRC one of the reasons I had my own benchmarking code in Scala is so I can easily re-use the benchmarks across platforms. Is JMH only for Scala-JVM? Or is there some cross-platform support too?
In jsoniter-scala-benchmarkJS
sub-project I use scalajs-benchmark (a Scala.js port of JMH for browsers).
For Scala Native I have only a rough measurements on validation of huge JSON files using an application from examples.
AFAIK @armanbilge have started porting of JMH to Scala for cross-platform support in his scala-jmh repo. I think the Scala community could sponsor his work on that.
This PR attempts to optimize the uPickle read/write logic while preserving binary compatibility
Code Changes
Generally avoid calling
BaseElemRenderer.getElemSafe
andElemBuilder.appendUnsafe
in hot paths. Instead, in both cases we try to callrequestUntilGetSafeIndex
/ensureLength
and then read/write as many characters as we can directly on the byte/char array.ElemBuilder.arr
andElemBuilder.length
public so that they can be used directly outside ofupickle.core
Adjust
requestUntil0
to properly handle cases wherereadDataIntoBuffer
returnsn = -1
Write integers directly to output without needing to
.toString
a temporary stringRead (most) integers directly from input without needing to
.toString
a temporary stringIn
ArrayWriter
andSeqLikeWriter
, liftctx.subVisitor
out of the loop since it never changesSeparated out "fast-path" simple-acii-string loop from "slow-path" escape/unicode loop during rendering and parsing
Remove redundant
flushBuffer
andflushElemBuilder
calls insidevisitFloat32
/visitFloat64
Fixed a bug in
WrapElemArrayCharSeq.charAt
which was not correct whenElem=Byte
Added new
visitFloat64{Char,Byte}Parts
visitor methods which are optimized versions ofvisitFloat64StringParts
.Fix a bug in
StringWriter
where it was not acceptingvisitFloat64StringParts
calls, even though it accepts all the othervisitFloat*
andvisitInt*
calls.Directly write unicode characters to UTF-8 bytes without going through an intermediate
makeString: String
and.getBytes: Array[Byte]
Vendored/patched the fast Schubfach Double/Float serialization code from Jackson-Core. This is similar to the version used in OpenJDK, but OpenJDK's is GPLed (rather than permissive) and thus cannot be used. Furthermore, even if a future version of Java includes the fast double serializer, vendoring/patching it allows us to write the output directly to the output char/byte array, saving us an intermediate string and byte array. For now, this is just for Scala-JVM, to allow us to re-use the Java sources with minimal changes while still providing perf on the platform where it matters the most
Testing Changes
Added some micro-benchmarks to the
bench/
folder, to better tease out differences that only affect specific code pathsMoved slow unit tests into dedicated
.testSlow
module, to get them out of the fast pathAdded some fuzz tests reading/writing a variety of Strings/Numbers of different lengths, to try and catch any bugs that the trivial unit tests don't
Benchmarks
git checkout optimize && ./mill bench.jvm.run && git checkout main && git checkout optimize bench && ./mill bench.jvm.run
on my M1 Macbook pro, Scala 2.13.10, Java 11"macro" benchmarks
upickleDefault Read/Write
show 16-33% speedups when handling JSON"micro" benchmarks show large speedups in
integers Write
, and the string handling operations in generalThe micro-benchmarks show a minor across-the-board speedup, across all benchmarks, that seems evident despite the random variation.
Main -> PR (higher is better)
As of 66d9af8634fd3d837fb051ecc41f7cf9669b5b72
Notes
A lot of function signatures changed or were moved around. I left
@deprecated
forwarder methods where necessary to keep binary compatibilityFor now, some possible performance gains are still left on the table: custom double parsing, vectorized parsing, etc.
For now, most changes are pretty localized. There's probably some optimization possibilities that would involve larger changes to the core interfaces
The performance effects of changes can be pretty surprising. A number of things that I thought would speed things up actually slowed things down (e.g. making
visitFloat64StringParts
/visitString
takeArray[Byte/Char]
rather thanCharSequence
, or making theescapeByte
's String -> UTF8 Bytes conversion direct rather than through an intermediateCharBuilder
)Per-commit rough benchmarks
5-second-long rough benchmarks on each significant commit, to see the numbers changing over time