golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.66k stars 17.62k forks source link

encoding/json: misleading error parsing invalid json with multibyte characters #57405

Open adamroyjones opened 1 year ago

adamroyjones commented 1 year ago

What version of Go are you using (go version)?

$ go version
go version go1.19.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Linux, amd64.

What did you do?

Here is a playground link.

package main

import (
    "encoding/json"
    "fmt"
)

func main() {
    // This uses typographic quotes, not straight quotes.
    bad := `{ “hello”: “world” }`
    var j json.RawMessage
    err := json.Unmarshal([]byte(bad), &j)
    fmt.Printf("err: %v\n", err)
    // prints: err: invalid character 'â' looking for beginning of object key string
}

What did you expect to see?

I'd expect the error message to identify the grapheme.

What did you see instead?

The error message prints out â. This corresponds to the first byte of the multibyte sequence that composes the grapheme .

Specifically

"“".bytes.first == 226 # true

which commonly corresponds to â.

I'm sure this is all well-known, but I didn't see a crisp example of this in the issue log. I may have missed it. If so, I'm sorry.

mvdan commented 1 year ago

This is somewhat similar to https://github.com/golang/go/issues/56332, but in this case there is a reasonably better error message we could provide: using the full rune.