golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.18k stars 17.37k forks source link

encoding/json: add Decoder.DisallowDuplicateFields #48298

Open dsnet opened 2 years ago

dsnet commented 2 years ago

The presence of duplicate fields in JSON input is almost always a bug from the sender and the behavior across various implementations is highly inconsistent. It's too late to switch the current behavior to always reject duplicate fields in the current package, but we can provide an option to enforce stricter checks. As such, I propose adding a Decoder.DisallowDuplicateFields option.


Background

Per RFC 8259, section 4, the handling of duplicate names is left as undefined behavior. Rejecting such inputs is within the realm of valid behavior. Tim Bray, the author of RFC 8259, actually recommends going beyond RFC 8259 and that implementations should instead target compliance with RFC 7493. RFC 7493 is a fully compatible subset of RFC 8259, which makes strict decisions about behavior that RFC 8259 leaves undefined (including the rejection of duplicate names).

The lack of duplicate name rejection has correctness implications where roundtrip unmarshal/marshal does not result in semantically equivalent JSON, and surprising behavior for users when they accidentally send JSON objects with duplicate names. In such a case, the current behavior is actually somewhat inconsistent and difficult to explain.

The lack of duplicate name rejection may have security implications since it becomes difficult for a security tool to validate the semantic meaning of a JSON object since meaning is inherently undefined in the presence of duplicate names.


Implementation

A naive implementation can remember all seen names in a Go map. A more clever implementation can take advantage of the fact that we are almost always unmarshaling into a Go map or Go struct. In the case of a Go map, we can use the Go map itself as a means to detect duplicate names. In the case of a Go struct, we can convert a JSON name into an index (i.e., the field index in the Go struct), and then use a an efficient bitmap to detect whether we saw the name before.

In the common case, there would be no performance slow downs to enabling checks for duplicate names.


Aside: I'm not fond of the name Fields since JSON terminology calls this either a "name" or "member" (per RFC 8259, section 4). However, it is consistent with the existing DisallowUnknownFields option.

\cc @bradfitz @crawshaw @mvdan

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 2 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 2 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

zigo101 commented 2 years ago

Is this for https://github.com/golang/go/issues/14750?

dsnet commented 2 years ago

Is this for #14750?

No, but they are related. Enabling DisallowDuplicateFields in a sense rejects case-insensitive names (e.g., foo and FoO) that map to the same Go struct field.

For #14750, we probably want an option for StrictNameMatching or something.

earthboundkid commented 2 years ago

Since duplicate names are UB, even Hyrum's Rule aside, can we treat duplicate names as blank unless equal? ISTM, if you're relying on getting the first name or the last name, you're in for trouble, so we should just return blank (but not an error unless DisallowDuplicateFields is set, due to the Hyrum's Law issue).

dsnet commented 2 years ago

Since duplicate names are UB ... we should just return blank (but not an error unless DisallowDuplicateFields is set, due to the Hyrum's Law issue).

Precisely due to Hyrum's Law, I don't see how we can change the default behavior without breaking existing usages.

zamicol commented 1 year ago

Looks like it's been a few months since there was any update. Is there anything I can do to help move a fix along?

mvdan commented 1 year ago

I would ask for patience; I know encoding/json is moving very slowly, but we are busy and there will be more to share soon. The blocker in this thread is certainly not a patch :)

disconnect3d commented 1 year ago

Would be happy to see this added finally. If you need an example of duplicate JSON fields causing troubles, e.g. to priority this more, here is one:

[Update 2017-11-18] A RCE vulnerability was found in CouchDB because two JSON parsers handle duplicate key differently. The same JSON object, when parsed in JavaScript, contains "roles": []', but when parsed in Erlang it contains "roles": ["_admin"].

From http://seriot.ch/projects/parsing_json.html

tzvatot commented 1 year ago

Would be happy to see this feature implemented.

dsnet commented 9 months ago

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal. In the proposed v2 package, duplicate names are rejected by default because it is a security risk (see CVE-2017-12635). We provide a jsontext.AllowDisallowNames option to allow duplicate names.