goccy / go-json

Fast JSON encoder/decoder compatible with encoding/json for Go
MIT License
3.12k stars 148 forks source link

Implement omitnil json tag - 100€ bounty #436

Open ivanjaros opened 1 year ago

ivanjaros commented 1 year ago

Example:

package main

import (
    "github.com/goccy/go-json"
    "os"
)

type Foo struct {
    Bar []string          `json:"bar,omitempty"`
    Baz map[string]string `json:"baz,omitempty"`
}

func main() {
    var a, b Foo
    b.Bar = []string{}          // <- empty, not nil
    b.Baz = map[string]string{} // <- empty, not nil

    e := json.NewEncoder(os.Stdout)
    _ = e.Encode(a)
    println("")
    _ = e.Encode(b)
}

Result is that both a and b will print {} instead of {} and {bar: [], baz: {}}. This is blatantly wrong behavior because it discards information. Just because array/slice/map is empty does not mean it does not exists(which is case of nil).

ivanjaros commented 1 year ago

If this won't get fixed to stay in line with the completely wrong native json behavior which the Go team refuses to fix, can you point me to a code to alter the behavior in here so that I could fork it and fix it for myself?

ivanjaros commented 1 year ago

Or possibly introduce new tag whose sole purpose will be to act as omitempty but only for nil values. In other words, if field has tag "omitnil", or configuration flag, it will not print out the field if the value is nil. Otherwise it will print the value as is(empty map, slice,..).

I could simply do some processing of json from data lacking omitempty tag that will contain "null" values that can be with a bit of work cut out of the resulting byte slice, BUT this won't work for streaming. Hence the need for built-in functionality.

ps: i'd pay 100€ for that omitnil functionality since I literally run into this on daily basis.

ivanjaros commented 1 year ago

this vm.go looks like the code that skips empty, but non-nil slice: obrázok

AbenezerKb commented 1 year ago

//this worked

package main

import (
     "github.com/goccy/go-json"
    "encoding/json"
    "os"
)

type Bar []string
type Baz map[string]string
type Foo struct {
    Bar *Bar `json:"bar,omitempty"`
    Baz *Baz `json:"baz,omitempty"`
}

func main() {
    var a, b Foo
    bb := Bar{}
    bz := Baz{}
    b.Bar = &bb // <- empty, not nil
    b.Baz = &bz // <- empty, not nil

    e := json.NewEncoder(os.Stdout)
    _ = e.Encode(a)
    println("")
    _ = e.Encode(b)
}

Output

{}
{"bar":[],"baz":{}}
ivanjaros commented 1 year ago

@goccy any interest in that 100€ bounty for implementing omitnil?

ivanjaros commented 1 year ago

Max already made a merge request https://github.com/goccy/go-json/pull/437 Code looks good, except that test is not exactly in line with the rest of tests(cosmetic).

ivanjaros commented 1 year ago

🤨

ianling commented 1 year ago

@ivanjaros are you still interested in this and is the bounty still active?

ivanjaros commented 1 year ago

@ivanjaros are you still interested in this and is the bounty still active?

interested yes, bounty no(it has been 7 months since and it no longer makes sense).

ivanjaros commented 9 months ago

In the end, I have made a cleaning function that will remove null values from marshalled output rather than patch this or other json marshaller.

Benchmark shows that goccy marshaller takes 329ns/op, native json marshaller takes 642ns/op and when i run the goccy result through my function I get 514ns/op with no allocations, which is faster than native json by 25% but still slower than goccy by 55%. I have spent a lot of time on this, fixing it and tweaking performance to get it here and I cannot find anything else to do. Profiling shows that the entire performance hit comes from bytes.Index and I do not see any way to improve it. I was simply wondering @goccy if you have any performance recommendations to make this faster, if possible?

package foo

import (
    "bytes"
)

var nullPattern = []byte("null")

// modifies the source, allocates no new memory.
func denil(src []byte) []byte {
    var offset int
    var idx int

    for {
        var closing int = 4 // "null"

        idx = bytes.Index(src[offset:], nullPattern)
        if idx >= 0 && len(src) > offset+idx+closing {
            // "null" is 4 bytes and we need to advance forward by one byte,
            // which is inclusive due to 0 slice index offset, so no need for +1 more
            switch src[offset+idx+closing] {
            case ',', '\n':
                idx += offset
                // when we get trailing comma or new line, we remove it along with the preceding value
                closing++
                closing += idx
            case '}', ']':
                idx += offset
                closing += idx
            default:
                // this is not actual null
                offset += idx
                offset += closing
                continue
            }
        } else {
            // we're done
            break
        }

        idx = findColon(src, idx)
        if idx < 0 {
            offset += 4 // 4 bytes for matched nil pattern
            continue
        }

        idx = findQuote(src, idx)
        if idx < 0 {
            offset += 4 // 4 bytes for matched nil pattern
            continue
        }

        idx = findQuote(src, idx)
        if idx < 0 {
            offset += 4 // 4 bytes for matched nil pattern
            continue
        }

        idx, closing = findEdges(src, idx, closing)

        src = append(src[:idx], src[closing:]...)

        offset = idx
    }

    return src
}

func findColon(src []byte, idx int) int {
    idx--

    if len(src)-1 < idx {
        return -1
    }

    for idx >= 0 {
        switch src[idx] {
        case ' ':
            idx--
        case ':':
            return idx
        default:
            return -1
        }
    }

    return -1
}

func findQuote(src []byte, idx int) int {
    idx--

    if len(src)-1 < idx {
        return -1
    }

    for idx >= 0 {
        if src[idx] == '"' {
            if idx > 0 && src[idx-1] == '\\' {
                idx--
            } else {
                return idx
            }
        } else {
            idx--
        }
    }

    return -1
}

func findEdges(src []byte, start, finish int) (int, int) {
    for i := start - 1; i >= 0; i-- {
        switch src[i] {
        case ' ', ',', '\n', '\t':
        default:
            start = i + 1
            i = 0 // break for loop
            break
        }
    }

    for i := finish; i < len(src); i++ {
        switch src[i] {
        case ' ', ',', '\n', '\t':
        default:
            finish = i
            i = len(src) // break for loop
            break
        }
    }

    // due to the way slicing works, the real finish is at -1.
    // we need to avoid removing commas from both sides.
    if src[start] == src[finish-1] {
        if src[start] == ',' {
            start++
        }
    }

    return start, finish
}