Open dop251 opened 1 year ago
Thanks for looking into this. I'm not sure if submitted PRs to Go before or not, if not, please have a look at https://go.dev/doc/contribute#sending_a_change_github. (Gopherbot will catch any missed steps.)
@dsnet and/or @mvdan , can you have a look at this?
The current Token
API is slow due to its nature. See https://github.com/golang/go/issues/40128 for a proposal to change the API.
That said, I don't oppose improvements to the existing API - as long as they don't break existing correctness guarantees. If you can send a change as described in https://go.dev/doc/contribute, we can discuss there. It's hard to judge whether or not this change is reasonable without seeing it in full.
Change https://go.dev/cl/443778 mentions this issue: encoding/json: reduce the number of allocations when decoding in streaming mode (Token API)
What is the benchmark being run? It's not one of the standard ones in the package.
The benchmark is run on this: https://go.dev/play/p/_QvFQQSTB0R
I could add it to the PR if necessary.
I figured it out eventually. It would help to explicitly say that:
func BenchmarkDecodeJson(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
DecodeStd() // or DecodeWithDecoder
}
}
that's the piece of information that was missing.
My bad, sorry.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I have been trying to investigate the difference in performance between parsing the same schemaless JSON using Decoder.Decode() and using Decoder.Token() with a simple handler. The code can be found here: https://go.dev/play/p/_QvFQQSTB0R
What did you expect to see?
A comparable performance.
What did you see instead?
"Old" here is the version which uses
Token()
What is especially striking is the difference in the number of allocations and the allocation size. The only allocations that I make are for maps and slices, but Decode() does them too. I thought something was off and decided to investigate.
So far I have found one reason:
readValue()
to read primitive values and map keys.readValue()
resets the state of the scanner at the beginning (https://github.com/golang/go/blob/7cf06f070e56dfb6507122704bc75d697ccc350f/src/encoding/json/stream.go#L90)stateEndValue()
, which thinks it has read the top-level value (because parseState is empty at this point): https://github.com/golang/go/blob/7cf06f070e56dfb6507122704bc75d697ccc350f/src/encoding/json/scanner.go#L281stateEndTop()
which checks the current character for being space (which it won't be because we're actually in the middle of a value) and then goes ahead an allocates and setsscanner.err
https://github.com/golang/go/blob/7cf06f070e56dfb6507122704bc75d697ccc350f/src/encoding/json/scanner.go#L331This error (and therefore the allocation) is completely unnecessary because the error gets dropped when
Token()
callsreadValue()
again.I tried to fix it by introducing a new flag in the scanner called 'inStream', then setting the flag at the beginning of
Token()
(and resetting it ondefer
) and then checking the flag instateEndValue()
to avoid allocating the error. It's not the most elegant solution, but it appears to be working:Note it's still somewhat off the
Decode()
performance but it's a significant improvement. I can submit a PR with my changes.