golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.66k stars 17.62k forks source link

encoding/json: does not recognise semicolon as a valid field name #39189

Closed kolatat closed 4 years ago

kolatat commented 4 years ago

What version of Go are you using (go version)?

go1.14 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

windows amd64

What did you do?

import (
    "encoding/json"
    "fmt"
)

func main() {
    encoded := []byte(`{";": "World!"}`)
    type MyObject struct {
        Hello string `json:";"`
    }
    var decoded MyObject
    if err := json.Unmarshal(encoded, &decoded); err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("%+v", decoded)
}

What did you expect to see?

{Hello:World!}

What did you see instead?

{Hello:}

@natebwangsut

gopherbot commented 4 years ago

Change https://golang.org/cl/234818 mentions this issue: encoding/json: allow semicolon in field key / struct tag

cagedmantis commented 4 years ago

/cc @rsc @dsnet @bradfitz @mvdan

mvdan commented 4 years ago

I don't see a reason why not, but where do we draw the line on which characters are OK and which are not? And why is the semicolon part of the first group?

I'm not familiar with the history of that piece of code, so I really am asking. I think that needs an answer before we review a change.

natebwangsut commented 4 years ago

So pretty much we check JSON spec (RFC-7159) for validity on our "bug" and it seems to us that the spec would treat a semicolon as a normal character.

https://tools.ietf.org/html/rfc7159

dsnet commented 4 years ago

I don't see a reason why not, but where do we draw the line on which characters are OK and which are not?

Back in February of 2011, the entirety of the struct tag was used as the JSON key name. It seems that the name syntax was restricted (https://golang.org/cl/4173061) so that the tags could in theory be used for other purposes like protocol buffers (#1520).

Later in June of 2011, a well-defined grammar for application-specific struct tags was defined and formally implemented in the reflect package (https://golang.org/cl/4645069).

It seems to me that the restricted set of valid characters is an artifact from a previous era to work around a limitation that no longer applies today.

The only restriction I can imagine for the character set would be a , since it is used to delimit the set of extra tag attributes that come after the name. In theory we could define a more complex grammar where someone could put a quoted string as to encode any arbitrary name that is valid UTF-8.

If the grammar is opened up to other characters, we'll need to consider how the equalFold logic is supposed to operate.

seankhliao commented 4 years ago

relaxing the restrictions would also fix #22518 and #35287

gopherbot commented 4 years ago

Change https://golang.org/cl/247059 mentions this issue: encoding/json: allow add quotes in field key / struct tag

zaneChou1 commented 4 years ago

Change https://golang.org/cl/247059 mentions this issue: encoding/json: allow add quotes in field key / struct tag

I read the JSON RFC-7159 standard and found that the quote ( \') in the Unicode character table as an ASCII punctuation mark should be allowed as a valid field name for JSON.
I submitted a change: https://go-review.googlesource.com/c/go/+/247059/

mvdan commented 4 years ago

Thanks @dsnet. It seems like incremental steps like allowing semicolon characters should be safe and trivial, so I'm approving that CL.

If anyone wants to work on other characters, please file a separate issue. But a better solution would be a generic one, not to keep adding more exceptions. I think we should use https://github.com/golang/go/issues/22518 for the generic solution. If anyone wants to work on that, just beware Joe's comment in https://github.com/golang/go/issues/39189#issuecomment-632467837.