fabienrenaud / java-json-benchmark

Performance testing of serialization and deserialization of Java JSON libraries
MIT License
965 stars 134 forks source link

Upgrade to JDK 17 + Some Qs on a new implementation. #59

Closed bowbahdoe closed 11 months ago

bowbahdoe commented 1 year ago

Hi all, I'm working on a PR to upgrade the minimum version to JDK 17 for the sole reason that I wrote a JSON library and that is its minimum version.

The library I wrote also doesn't do automatic databind or streaming, rather it provides helpers for manual serde

    @io.avaje.jsonb.Json
    @JsonObject
    public static final class Partner implements ToJson {

        // ...

        @Override
        public Json toJson() {
            return Json.objectBuilder()
                    .put("id", id)
                    .put("name", name)
                    .put("since", since.toString())
                    .build();
        }

        public static Partner fromJson(Json json) {
            return Partner.create(
                    Decoder.field(json, "id", Decoder::long_),
                    Decoder.field(json, "name", Decoder::string),
                    Decoder.field(json, "since", since -> OffsetDateTime.parse(Decoder.string(since)))
            );
        }
    }

I'm wiring it up to the "databind" section though that feels a tad strange, just hoping this can be optimized

    @Benchmark
    @Override
    public Object devmccuejson() throws Exception {
        var string = JSON_SOURCE().nextString();
        var type = JSON_SOURCE().pojoType();
        if (type == Users.class) {
            return Users.fromJson(Json.readString(string));
        }
        else if (type == Clients.class) {
            return Clients.fromJson(Json.read(string));
        }
        throw new RuntimeException("unhandled");
    }

So...just digging for general advice and making sure conceptually folks are okay upgrading the suite

fabienrenaud commented 1 year ago

Upgrading the min version to JDK 17 is fine. If you want to do so, please send à PR that does just that.

As for adding “devmccuejson”, a few things:

Hope this helps.

On Sun, Dec 11, 2022 at 11:49 Ethan McCue @.***> wrote:

Hi all, I'm working on a PR to upgrade the minimum version to JDK 17 for the sole reason that I wrote a JSON library and that is its minimum version.

The library I wrote also doesn't do automatic databind or streaming, rather it provides helpers for manual serde

@io.avaje.jsonb.Json
@JsonObject
public static final class Partner implements ToJson {

    // ...

    @Override
    public Json toJson() {
        return Json.objectBuilder()
                .put("id", id)
                .put("name", name)
                .put("since", since.toString())
                .build();
    }

    public static Partner fromJson(Json json) {
        return Partner.create(
                Decoder.field(json, "id", Decoder::long_),
                Decoder.field(json, "name", Decoder::string),
                Decoder.field(json, "since", since -> OffsetDateTime.parse(Decoder.string(since)))
        );
    }
}

I'm wiring it up to the "databind" section though that feels a tad strange, just hoping this can be optimized

 @Benchmark
@Override
public Object devmccuejson() throws Exception {
    var string = JSON_SOURCE().nextString();
    var type = JSON_SOURCE().pojoType();
    if (type == Users.class) {
        return Users.fromJson(Json.readString(string));
    }
    else if (type == Clients.class) {
        return Clients.fromJson(Json.read(string));
    }
    throw new RuntimeException("unhandled");
}

}

— Reply to this email directly, view it on GitHub https://github.com/fabienrenaud/java-json-benchmark/issues/59, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXX2KJZG66FNC6APBB3353WMYV25ANCNFSM6AAAAAAS3FSNVQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Fabien

bowbahdoe commented 1 year ago

if the lib performs poorly and is too new/experimental/unknown, it will likely not make it in the final cut of benchmarking results I’ll publish next year (these graphs are getting pretty crowded). Results will still be available in the raw data.

lol, I won't argue that

It seems like a lib meant to do partial decoding and may not be optimized for full object see/deser. If so, this benchmark is likely a poor fit for it.

I am curious, just from a terminology perspective, what "partial decoding" means? I've been conceptualizing it like described here and in that model its way number 3, while the other benchmarks are ways 1 and 2.

Feel free to integrate your lib though. And use benchmarks to improve perf.

Yep - thats what i'm doodling on now - I'm only 5x as slow as jackson right now on my M1 which is better than I imagined I would be. Still working on it.

Benchmark                                              Mode  Cnt        Score        Error  Units
c.g.f.j.databind.Deserialization.devmccuejson         thrpt   20   691550.289 ±  20517.628  ops/s
c.g.f.j.databind.Deserialization.jackson              thrpt   20  2934405.079 ± 141809.396  ops/s
c.g.f.j.databind.Deserialization.jackson_afterburner  thrpt   20  3428251.662 ± 208667.301  ops/s
c.g.f.j.databind.Deserialization.jackson_blackbird    thrpt   20  3420256.700 ± 121323.185  ops/s
c.g.f.j.stream.Deserialization.jackson                thrpt   20  3383085.117 ± 116841.161  ops/s
Benchmark                                            Mode  Cnt        Score        Error  Units
c.g.f.j.databind.Serialization.devmccuejson         thrpt   20  1091794.130 ±  20064.302  ops/s
c.g.f.j.databind.Serialization.jackson              thrpt   20  5282597.172 ± 175987.525  ops/s
c.g.f.j.databind.Serialization.jackson_afterburner  thrpt   20  5310019.939 ±  81742.711  ops/s
c.g.f.j.databind.Serialization.jackson_blackbird    thrpt   20  5193569.312 ± 137925.642  ops/s
c.g.f.j.stream.Serialization.jackson                thrpt   20  4962130.482 ± 202865.033  ops/s

Just from a tooling perspective, would you happen to know how I could get a "heatmap" or something similar to figure out where im getting slowed down?

rbygrave commented 1 year ago

only 5x as slow as jackson right now ... where I'm getting slowed down

Not sure how fast you are wanting to go but in case it's useful I can point to a couple of things you might look at. NB: I'm the author avaje-jsonb which I believe is the 2nd fastest json lib.

  1. Buffer recycling
  2. Pre-encoded keys

In terms of buffer recycling all the fastest libs currently use thread local based buffer recycling. A thing to note here is that this looks like it could be moot in Java 19+ in that from Java 19 onwards the GC can be smart enough to out-perform the use of thread local buffers. The recommendation I'd suggest is to also run your benchmarks in Java 19+ wrt running with and without buffer recycling.

In terms of pre-encoded keys, avaje-jsonb, Jackson and dsl-json make use of pre-encoded keys during serialisation. This means that writing keys is effectively a byte[] copy (effectively skipping escaping and encoding). For deserialisation avaje-jsonb and dsl-json also make use of pre-encoded keys which means that the vast majority of the time they don't need to decode the bytes to utf8 strings to determine the key and can instead use a hash function on the bytes and lookup known keys (with protection for the rare case that the hash isn't unique in the current context and fallback for unknown keys).

Cheers, Rob.

bowbahdoe commented 1 year ago

how fast you are wanting to go

Thats actually a pretty solid question, and I don't know the answer to that yet.

Broadly speaking (I think) I have three goals for my API

  1. uses sealed interfaces and is explainable in terms of "json is one of null, a string, an array of json, ..."
  2. has mechanisms for turning Json into user defined classes that does not require databind to be ergonomically acceptable.
  3. might be considered for the "immutable tree" part of the long comatose JEP 198 (which explicitly rules out adding a standardized "databind" api). Specifically I think the Decoder approach is "declarative enough" and "good enough" to make "we added json to the jdk but no databind" not make everyone equally unhappy.

So performance isn't my personal biggest concern, though i'd imagine (3) would be a pretty hard sell if what I come up with is orders of magnitude slower than every alternative.


This is my code as is. It's ugly, undocumented, and half the commit messages are "..." but it is available on maven central.

https://github.com/bowbahdoe/json https://github.com/bowbahdoe/json.decode.alpha

        <dependency>
            <groupId>dev.mccue</groupId>
            <artifactId>json</artifactId>
            <version>0.0.9</version>
        </dependency>
        <dependency>
            <groupId>dev.mccue</groupId>
            <artifactId>json.decode.alpha</artifactId>
            <version>0.0.9</version>
        </dependency>

All of the parser code was translated from this Clojure library and I'm not sure the source of any of it. It wasn't chosen for any other reason that I didn't want to make an artifact which depended on jackson, didn't feel like figuring out shading, and felt it was more enjoyable of an experience to do the translation.


I will attempt to grok buffer recycling and pre-encoded keys in the morning when I'm fresh

Thank you for your time, Ethan

bowbahdoe commented 1 year ago

Upgrade PR is #60

bowbahdoe commented 1 year ago

Encouraging news to anyone who cares - the approach of Decoder.field(json, "name", Decoder::string) seems to have a negligible performance difference compared to raw checks and casts on a tree structure.

684615.461 ± 16828.333 (using raw tree traversal and casts like the underscore example)
657081.560 ± 16956.363 (using the decoder API)

underscore_java absolutely smokes my code using the same raw tree traversal approach

Deserialization.underscore_java  thrpt   20  2182809.408 ± 16939.066  ops/s

So the ~4% hit could reasonably go down to ~1.2% at least with a more efficient tree/generation thereof.

fabienrenaud commented 1 year ago

The project now requires Java 17 since PR #64