Wonky performance numbers when encoding the same thing through different interfaces

karalabe commented 1 week ago

I've tried a few different combinations of types and interfaces to encode the same thing. Interestingly, there's a 30% speed variation depending on what I'm calling, which seems extreme. I'd expect the same performance, independent of where the data enters into the encoder.

BenchmarkMarshal2String-12           232       5220942 ns/op
BenchmarkMarshal2RawJSON-12          283       4093803 ns/op
BenchmarkMarshal2Texter-12           222       5399327 ns/op
BenchmarkMarshal2Jsoner-12           265       4748703 ns/op
BenchmarkMarshal2Jsoner2-12          271       4422361 ns/op

package test 

import (
    "bytes"
    "encoding/hex"
    "encoding/json"
    "testing"

    json2 "github.com/go-json-experiment/json"
    "github.com/go-json-experiment/json/jsontext"
)

func BenchmarkMarshalString(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    str := hex.EncodeToString(src)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json.Marshal(str)
    }
}

func BenchmarkMarshal2String(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    str := hex.EncodeToString(src)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json2.Marshal(str)
    }
}

func BenchmarkMarshalRawJSON(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    msg := json.RawMessage(`"` + hex.EncodeToString(src) + `"`)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json.Marshal(msg)
    }
}

func BenchmarkMarshal2RawJSON(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    msg := json.RawMessage(`"` + hex.EncodeToString(src) + `"`)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json2.Marshal(msg)
    }
}

func BenchmarkMarshalTexter(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    txt := &Texter{str: hex.EncodeToString(src)}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json.Marshal(txt)
    }
}

func BenchmarkMarshal2Texter(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    txt := &Texter{str: hex.EncodeToString(src)}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json2.Marshal(txt)
    }
}

func BenchmarkMarshalJsoner(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    jsn := &Jsoner{str: hex.EncodeToString(src)}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json.Marshal(jsn)
    }
}

func BenchmarkMarshal2Jsoner(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    jsn := &Jsoner{str: hex.EncodeToString(src)}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json2.Marshal(jsn)
    }
}

func BenchmarkMarshal2Jsoner2(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    jsn := &Jsoner2{str: hex.EncodeToString(src)}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        json2.Marshal(jsn)
    }
}

func BenchmarkMarshalCopyString(b *testing.B) {
    src := bytes.Repeat([]byte{'0'}, 4194304)
    str := hex.EncodeToString(src)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        buf := make([]byte, len(str)+2)
        buf[0] = '"'
        copy(buf[1:], str)
        buf[len(buf)-1] = '"'
    }
}

type Texter struct {
    str string
}

func (t Texter) MarshalText() ([]byte, error) {
    return []byte(t.str), nil
}

type Jsoner struct {
    str string
}

func (j Jsoner) MarshalJSON() ([]byte, error) {
    return []byte(`"` + j.str + `"`), nil
}

type Jsoner2 struct {
    str string
}

func (j Jsoner2) MarshalJSONV2(enc *jsontext.Encoder, opts json2.Options) error {
    return enc.WriteValue([]byte(`"` + j.str + `"`))
}

dsnet commented 1 week ago

Presently, I'm unable to reproduce those results on my Ryzen 5900x. I get:

BenchmarkMarshal2String              192       5595562 ns/op     8485392 B/op          3 allocs/op
BenchmarkMarshal2RawJSON             224       5672734 ns/op     8585393 B/op          3 allocs/op
BenchmarkMarshal2Texter              122       9177080 ns/op    16857143 B/op          3 allocs/op
BenchmarkMarshal2Jsoner              133       7760851 ns/op    25256777 B/op          5 allocs/op
BenchmarkMarshal2Jsoner2             181       7379420 ns/op    16841790 B/op          3 allocs/op

Texter, Jsoner, and Jsoner2 are notably slower because they allocate one (or more) intermediate copies of the string (~8MiB). In the case of String and RawJSON, the allocated amount approximately matches the string length needed for the output buffer.

dsnet commented 1 week ago

Out of curiosity, what's the relationship between the lifetime of these strings and how they're marshaled?

Do you create the strings once, but marshal them multiple times? Or is it a 1:1 relationship where the creation of a string exactly correlates with a single marshal call?

The relevance of this is an idea that @mvdan once had of having jsontext.String precompute properties about the string, allowing future marshaling of the string to bypass certain checks (e.g., whether escaping is necessary). However, this only helps your situation if these large blobs are constructed once and marshaled multiple times.

karalabe commented 1 week ago

My specific use case is a small control HTTP RPC API that was created between 2 locally (same machine or same LAN) processes. Originally this API was specced to use JSON because it was simple, and within it it represented binary blobs as hex strings (for legacy reasons). The simplifications all came from the necessity to support 9 different implementations of the different sides of this API in different languages, so we've tried to keep it simple.

The purpose of the API is nonetheless "control" so the latency is very relevant (we're expecting LAN-style millisecond latencies, not internet-style 50+ms latencies). The size of our packets were around 50KB on the order of magnitude, so we haven't bothered much about how performant the json package is.

Fast forward a year however, and our small control API sometimes needs to send over 1-2MB blobs of data. 2MB within the OS or a LAN is still very much acceptable, but the json overhead starts to be felt. It's not yet a problem, but it's not irrelevant either.

We have yet another proposal in the works which would introduce another message type that can grow to 10-20MB. Now that's where the latency starts to bite us unacceptably and it seems it originates from Go's hex encoding for one smaller part, and apparently Go's json package for the main part. For me that was very surprising, so started looking into why it's doing what it's doing.

Now, I completely agree that in an internet latency/bandwidth scenario, the package overhead is not relevant. In a local network or within os setting however, it is, so would be nice to address if somehow if possible.

As for JSON being the wrong format for low latency apps, yes, I agree there, and will probably push for replacing it. But still, would be nice to fix json if it's in the works again :)

go-json-experiment / json

Wonky performance numbers when encoding the same thing through different interfaces #45