bytedance / sonic

A blazingly fast JSON serializing & deserializing library
Apache License 2.0
6.59k stars 327 forks source link

optimize: extra string/[]byte copies when calling Unmarshal #611

Closed mmosky closed 4 months ago

mmosky commented 4 months ago

When using sonic.Unmarshal, unnecessary string and []byte conversions occur, leading to additional copies.

Initially, the byte slice passed to sonic.Unmarshal is converted to a string, resulting in the first copy.

Subsequently, UnmarshalFromString is invoked, internally employing bytes.NewBufferString, which again converts the string back to a byte slice, resulting in the second copy.

A simplest case:

func BenchmarkSonicUnmarshal(t *testing.B) {
    for i := 0; i < t.N; i++ {
        data := []byte{'1'}
        var x int64
        // err := json.Unmarshal(data, &x) // 3 allocs/op
        err := sonic.Unmarshal(data, &x) // 7 allocs/op
        if err != nil {
            t.Fatal(err.Error())
        }
    }
}
AsterDY commented 4 months ago

Unmarshal arguement is []byte, which is MUTABLE in Golang thus must to be converted to be IMMUTABLE -- This is a standard way obey Golang's specs. As for performance, you can choose to use UnmarshalFromString or StreamDecoder at first, it all depends on yourself - as long as YOU KNOW WHAT YOUR BUSINESS IS DOING