golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.96k stars 17.53k forks source link

encoding/json: clearer error message for boolean like prefix in invalid json strings #56332

Open gansvv opened 1 year ago

gansvv commented 1 year ago

What version of Go are you using (go version)?

$ go version
1.19

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
Using go playground. 

https://go.dev/play/p/-AeaDZbeRj1

What did you do?

Json.Unmarshal tries to box a string that starts with a boolean character ('t' or 'f') into a different error compared to a string starting with any other character. This causes the json validation error to be confusing. Can the error be fixed to be similar in both cases (first character invalid) or make the error for attempted boolean parsing to be different ("invalid boolean prefix detected", etc.).

Reproduced here: https://go.dev/play/p/-AeaDZbeRj1

Code:

package main

import (
    "encoding/json"
    "fmt"
)

func main() {

    // check valid input
    myJsonString := `{"some":"json"}`

    // check invalid input, like boolean start character.
    myJsonString1 := `test`
    myJsonString2 := `random`
    myJsonString3 := `true`
    var s map[string]interface{}
    fmt.Println(json.Unmarshal([]byte(myJsonString), &s))
    fmt.Println(json.Unmarshal([]byte(myJsonString1), &s))
    fmt.Println(json.Unmarshal([]byte(myJsonString2), &s))
    fmt.Println(json.Unmarshal([]byte(myJsonString3), &s))
}

Output:

invalid character 'e' in literal true (expecting 'r')
invalid character 'r' looking for beginning of value
json: cannot unmarshal bool into Go value of type map[string]interface {}

Program exited.

In the above, the error showing "invalid character 'e' in literal true" can be updated to state "invalid literal..." for clarity.

What did you expect to see?

"invalid literal true, error in character 'e'"

What did you see instead?

invalid character 'e' in literal true (expecting 'r')

dr2chase commented 1 year ago

@dsnet @mvdan

mvdan commented 1 year ago

I see what you're saying. There's no "true" in the first input, so the error can be a bit confusing.

"invalid literal true, error in character 'e'"

I don't think this error is particularly better, for what it's worth.

All in all, note that the decoder tokenizes one byte at a time. So when it sees a t, it knows it can only be the beginning of true; same for f and false, or " and a quoted string. That's why the error messages are the way they are.

I think we could do better in terms of human-friendly messages, but I can't think of an obviously better choice right now.