francoispqt / gojay

high performance JSON encoder/decoder with stream API for Golang
MIT License
2.11k stars 113 forks source link

encoding ints to strings #137

Open marten-seemann opened 4 years ago

marten-seemann commented 4 years ago

First of all, thank you a lot for this awesome library. I just ran a few benchmarks on my marshalling code, and I'm getting a speedup of 6x (!) and a reduction of 25x (!) on allocated memory compared to the Go standard library. That's a crazy difference!

Trying to optimize things even further (just trying to push things to the limit ;)), I've encountered the following problem: In the JSON format that I'm supposed to output, integers need to be encoded as strings (because many JSON implementations only support numbers up to 2^53 or so). With the Go standard library this is pretty straightforward. I'd define a value like this

type frame struct {
    Offset int64 `json:"offset,string"`
}

This would automatically output the int64 (in decimal representation) as a string.

As far as I've found out so far, the way to do this using gojay would be the following:

enc.StringKey("offset", strconv.FormatInt(int64(f.Offset)))

The problem is that strconv.FormatInt has to allocate the string it is returning, just to have gojay copy the string contents to the JSON output being written, and garbage collected afterwards.

I can see two ways how to optimize this:

  1. I could allocate a byte slice (or string) as a buffer, encode my int64 into that string, and then use StringKey with this buffer, reusing it every time I need to write a number as a string. This solution would not require any changes to gojay.
  2. gojay could offer an API to do exactly this, e.g. a function Encoder.Int64AsString. Considering that my use case is apparently pretty common and even supported by the standard library, maybe this is the cleaner solution?

Does this make sense? Or am I missing something here?

marcsantiago commented 4 years ago

So I took a look at the standard libraries implementation In decode.go line 867 the function called literalStore handles the string to numeric conversion.

In around 879 you can that fromQuoted variable is whats used to determine if the value has the string tag to conversions.

Later on on line 1013 you can see that within the switch block that a type conversion is being done

n, err := strconv.ParseInt(s, 10, 64)
if err != nil || v.OverflowInt(n) {
    d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
    break
}
v.SetInt(n)

The encoding.go package in the std lib does not does not use the string tag. It uses the field type directly not the tag itself unless the tag is omitempty and in that case it skips the field is empty.

In that sense

enc.StringKey("offset", strconv.FormatInt(int64(f.Offset)))

it is still better as it does not rely on reflection to cast that value :-)