Open tonybase opened 3 years ago
The protojson
package uses its own implementation of JSON since it needs to comply with a stricter version of JSON (RFC 7493) than what encoding/json
supports (RFC 7159). The internal implementation does not benefit from years of optimizations that have been added to the standard library. There's some work going on to investigate major changes to the encoding/json
package which may allow us to have native RFC 7159 support, in which case we will delete the internal library and use the standard one instead (and benefit from its optimizations).
The main reason that encoding/json
is not used is because google/protobuf/duration.proto
, int64(string)
and oneof
marshal are inconsistent with the standard library JSON and there is no good solution.
https://developers.google.com/protocol-buffers/docs/proto3#json
I want to use encoding/json. Can I implement MarshalJSON and UnmarshalJSON interfaces for known types?
https://github.com/protocolbuffers/protobuf-go/tree/master/types/known
The main reason that encoding/json is not used is because google/protobuf/duration.proto, int64(string) and oneof marshal are inconsistent with the standard library JSON and there is no good solution.
That's not true. There are many reasons why encoding/json
is not used including the representation of oneofs, well-known types, and so forth.
related: protocolbuffers/protobuf#8331
Let's keep this topic about improving performance. Trying to serialize int64 as a JSON number is an unrelated concern.
Another potential performance optimization is to allow to disable field sorting. Any/random order is fine in most situations.
Are there any current plans for this optimization?
Hi, afaik the issue tracker continues to be this one both for this and the newer protobuf-go repo, let me know if that's not the case.
I've submitted a PR to improve marshaling performance at https://go-review.googlesource.com/c/protobuf/+/373356
name old time/op new time/op delta
Marshal_ScalarsUnordered-8 3.49µs ± 8% 2.06µs ± 1% -40.84% (p=0.000 n=29+27)
Marshal_Scalars-8 4.18µs ± 7% 2.80µs ± 1% -32.94% (p=0.000 n=29+30)
name old alloc/op new alloc/op delta
Marshal_ScalarsUnordered-8 1.47kB ± 0% 0.14kB ± 0% -90.76% (p=0.000 n=30+30)
Marshal_Scalars-8 1.63kB ± 0% 0.30kB ± 0% -81.87% (p=0.000 n=30+30)
name old allocs/op new allocs/op delta
Marshal_ScalarsUnordered-8 29.0 ± 0% 4.0 ± 0% -86.21% (p=0.000 n=30+30)
Marshal_Scalars-8 34.0 ± 0% 9.0 ± 0% -73.53% (p=0.000 n=30+30)
Boring details: I've added the 'Unordered' marshalopt on the old
impl to sample unordered and ordered benchmarks.
With benchstat
try running with -count=10
to get ten samples each, and then the delta
will provide you with details about how much better/worse the metric is, and the confidence of that value. You’ll notice all of these have a p-value of 1, which is not particularly convincing on its own.
updated
going back to this in a bit, on vacation rn
https://github.com/tonybase/benchmark/tree/main/benchmark-protobuf-json:
It doesn't seem to optimize performance very well, and Encoder doesn't use a similar standard library EncodeStatePool to reduce object allocation