encoding/json: excessive allocations when using the Token() api

dop251 commented 1 year ago

What version of Go are you using (`go version`)?

$ go version
go version
go version go1.19 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/dop/Library/Caches/go-build"
GOENV="/Users/dop/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/dop/Documents/workspace/pkg/mod"
GOOS="darwin"
GOPATH="/Users/dop/Documents/workspace"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.19/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.19/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.19"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/dop/Documents/workspace/go/test/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/26/dlsc0mm136n0sdrsvb_p90y80000gn/T/go-build87975269=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I have been trying to investigate the difference in performance between parsing the same schemaless JSON using Decoder.Decode() and using Decoder.Token() with a simple handler. The code can be found here: https://go.dev/play/p/_QvFQQSTB0R

What did you expect to see?

A comparable performance.

What did you see instead?

name          old time/op    new time/op    delta
DecodeJson-4    21.7µs ± 2%     5.5µs ± 2%  -74.66%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
DecodeJson-4    6.09kB ± 0%    2.29kB ± 0%  -62.40%  (p=0.000 n=9+9)

name          old allocs/op  new allocs/op  delta
DecodeJson-4       211 ± 0%        33 ± 0%  -84.36%  (p=0.000 n=10+10)

"Old" here is the version which uses Token()

What is especially striking is the difference in the number of allocations and the allocation size. The only allocations that I make are for maps and slices, but Decode() does them too. I thought something was off and decided to investigate.

So far I have found one reason:

Token() uses readValue() to read primitive values and map keys. readValue() resets the state of the scanner at the beginning (https://github.com/golang/go/blob/7cf06f070e56dfb6507122704bc75d697ccc350f/src/encoding/json/stream.go#L90)
After the value is read the scanner ends up in stateEndValue(), which thinks it has read the top-level value (because parseState is empty at this point): https://github.com/golang/go/blob/7cf06f070e56dfb6507122704bc75d697ccc350f/src/encoding/json/scanner.go#L281
It then calls stateEndTop() which checks the current character for being space (which it won't be because we're actually in the middle of a value) and then goes ahead an allocates and sets scanner.err https://github.com/golang/go/blob/7cf06f070e56dfb6507122704bc75d697ccc350f/src/encoding/json/scanner.go#L331

This error (and therefore the allocation) is completely unnecessary because the error gets dropped when Token() calls readValue() again.

I tried to fix it by introducing a new flag in the scanner called 'inStream', then setting the flag at the beginning of Token() (and resetting it on defer) and then checking the flag in stateEndValue() to avoid allocating the error. It's not the most elegant solution, but it appears to be working:

name          old time/op    new time/op    delta
DecodeJson-4    21.7µs ± 2%    12.4µs ± 2%  -42.83%  (p=0.000 n=10+9)

name          old alloc/op   new alloc/op   delta
DecodeJson-4    6.09kB ± 0%    3.68kB ± 0%  -39.54%  (p=0.000 n=9+9)

name          old allocs/op  new allocs/op  delta
DecodeJson-4       211 ± 0%        76 ± 0%  -63.98%  (p=0.000 n=10+10)

Note it's still somewhat off the Decode() performance but it's a significant improvement. I can submit a PR with my changes.

dr2chase commented 1 year ago

Thanks for looking into this. I'm not sure if submitted PRs to Go before or not, if not, please have a look at https://go.dev/doc/contribute#sending_a_change_github. (Gopherbot will catch any missed steps.)

@dsnet and/or @mvdan , can you have a look at this?

mvdan commented 1 year ago

The current Token API is slow due to its nature. See https://github.com/golang/go/issues/40128 for a proposal to change the API.

That said, I don't oppose improvements to the existing API - as long as they don't break existing correctness guarantees. If you can send a change as described in https://go.dev/doc/contribute, we can discuss there. It's hard to judge whether or not this change is reasonable without seeing it in full.

gopherbot commented 1 year ago

Change https://go.dev/cl/443778 mentions this issue: encoding/json: reduce the number of allocations when decoding in streaming mode (Token API)

dsnet commented 1 year ago

What is the benchmark being run? It's not one of the standard ones in the package.

dop251 commented 1 year ago

The benchmark is run on this: https://go.dev/play/p/_QvFQQSTB0R

I could add it to the PR if necessary.

dsnet commented 1 year ago

I figured it out eventually. It would help to explicitly say that:

func BenchmarkDecodeJson(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        DecodeStd() // or DecodeWithDecoder
    }
}

that's the piece of information that was missing.

dop251 commented 1 year ago

My bad, sorry.

golang / go