go-json-experiment / json

Experimental implementation of a proposed v2 encoding/json package
BSD 3-Clause "New" or "Revised" License
341 stars 11 forks source link

with Multiline option still get \n at the end of json text #40

Open ethanvc opened 1 month ago

ethanvc commented 1 month ago
    var buf bytes.Buffer
    encoder := jsontext.NewEncoder(&buf, jsontext.Multiline(false))
    encoder.WriteToken(jsontext.ObjectStart)
    encoder.WriteToken(jsontext.ObjectEnd)
    s := buf.String()
    if s == "{}\n" {
                 // will go here, which is not consistent with Multiline option
        fmt.Println("here")
    }

This behavior is not expected by library users. Expected output: "{}" without the tail \n.

dsnet commented 1 month ago

The Multiline option used to be called Expand, which somewhat avoided this oddity. \cc @veqryn

The documentation for Multiline says:

the JSON output should expand to multiple lines, where every JSON object member or JSON array element appears on a new, indented line according to the nesting depth.

In other words, it affects the emission of newlines within top-level JSON values, and does not affect the emission of newlines between top-level JSON values.

It is a valid use-case to have output where each top-level JSON value be non-multiline, but separated by newlines:

{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

Such a format is usually called "newline delimited JSON".

Furthermore, the current semantic matches the behavior of the v1 "json" package.

Thus, the current semantic is reasonable, though the naming of the option may need to be adjusted to be clear that it only takes effect within a top-level value.

The Marshal and MarshalWrite functions internally avoid the trailing newline because they use an unexported OmitTopLevelNewline option. Perhaps exposing this option is what you really want?

dsnet commented 1 month ago

One reason we did not expose OmitTopLevelNewline is because of the following edge case:

func main() {
    var buf bytes.Buffer
    encoder := jsontext.NewEncoder(&buf, jsontext.OmitTopLevelNewline(true))
    encoder.WriteToken(jsontext.Uint(98765))
    encoder.WriteToken(jsontext.Uint(43210))
    fmt.Println(buf.String())
}

What should this do?

dsnet commented 1 month ago

An alternative API is an option that emits newlines between top-level values, instead of after every top-level value. The distinction is subtle, but accomplishes what you would want:

between after
"{}" "{}\n"
"{}\n{}" "{}\n{}\n"
"0" "0\n"
"0\n1" "0\n1\n"
ethanvc commented 1 month ago

One reason we did not expose OmitTopLevelNewline is because of the following edge case:

func main() {
    var buf bytes.Buffer
    encoder := jsontext.NewEncoder(&buf, jsontext.OmitTopLevelNewline(true))
    encoder.WriteToken(jsontext.Uint(98765))
    encoder.WriteToken(jsontext.Uint(43210))
    fmt.Println(buf.String())
}

What should this do?

  • Report an error when writing 43210? (i.e., we only permit encoding a single top-level token)
  • Emit 9876543210 in which case it is impossible to parse back the original input?
  • Emit 98765 43210, where we special-case putting a whitespace in-between JSON numbers to avoid any ambiguities?

I think the first option is good and if we want write multi top json value in one buffer, we should do it explicitly.

ethanvc commented 1 month ago

An alternative API is an option that emits newlines between top-level values, instead of after every top-level value. The distinction is subtle, but accomplishes what you would want:

between after "{}" "{}\n" "{}\n{}" "{}\n{}\n" "0" "0\n" "0\n1" "0\n1\n"

Yes, this is what i want, and i think is more reasonable.

ethanvc commented 1 month ago

so, what's the plan for this topic? @dsnet

dsnet commented 1 month ago

I agree this is something to address, but I think we'll defer addressing it until the v2 work lands in stdlib. There's already enough other work to do in preparation for that.