golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.66k stars 17.62k forks source link

encoding/json: decoding into existing map pointer values unexpectedly reallocates them #31924

Open as opened 5 years ago

as commented 5 years ago

When decoding JSON data into an existing map object, an existing key's value is reallocated. I can't determine conclusively whether or not this is expected behavior, but the doc seems to be unclear on this case.

The example below should explain this in a better way:

https://play.golang.org/p/y_VMAgevTNg

It is unexpected that B and A (as variables) lose coherence after the call to json.Unmarshal. The call reallocates "A.B", creating a copy of it. After that call, the objects are independent, which may be unexpected behavior.

What did you expect to see?

1
{"B":5}
{"A":{"B":5}}
2
{"B":6}
{"A":{"B":6}}
3
2009/11/10 23:00:00 addr(A=0x414030 C=0x43e260 C[A]=0x414030)
2009/11/10 23:00:00 addr(A=0x414030 C=0x43e260 C[A]=0x414030)
{"B":7}
{"A":{"B":7}}
4
{"B":16}
{"A":{"B":16}}

What did you see instead?

1
{"B":5}
{"A":{"B":5}}
2
{"B":6}
{"A":{"B":6}}
3
2009/11/10 23:00:00 addr(A=0x414030 C=0x43e260 C[A]=0x414030)
2009/11/10 23:00:00 addr(A=0x414030 C=0x43e260 C[A]=0x414140)
{"B":6}
{"A":{"B":7}}
4
{"B":16}
{"A":{"B":7}}

Possibly relevant godoc from encoding/json

Specifically, this part:

Unmarshal unmarshals the JSON into the value pointed at by the pointer.

To me implies that it does not allocate a new value and set the pointer to point to it, but instead uses the existing value pointed at.

    Unmarshal uses the inverse of the encodings that Marshal uses,
    allocating maps, slices, and pointers as necessary, with the following
    additional rules:

    To unmarshal JSON into a pointer, Unmarshal first handles the case of
    the JSON being the JSON literal null. In that case, Unmarshal sets the
    pointer to nil. Otherwise, Unmarshal unmarshals the JSON into the value
    pointed at by the pointer. If the pointer is nil, Unmarshal allocates a
    new value for it to point to.

    To unmarshal a JSON object into a map, Unmarshal first establishes a map
    to use. If the map is nil, Unmarshal allocates a new map. Otherwise
    Unmarshal reuses the existing map, keeping existing entries. Unmarshal
    then stores key-value pairs from the JSON object into the map. The map's
    key type must either be a string, an integer, or implement
    encoding.TextUnmarshaler.
josharian commented 5 years ago

cc @mvdan

cuonglm commented 5 years ago

If you try:

json.Unmarshal([]byte(`{"F":{"B":7}}`), &C)

You will see that C["A"] was kept.

So writing new value in this case seems to be ok for me. The spec is not clear about it.

seebs commented 5 years ago

If a map is already present, we expect it to populate provided keys but not remove existing keys that weren't overwritten.

If a struct is already present, we sometimes expect it to populate provided keys but not zero out existing keys.

But if a map contains a struct (or pointer to a struct), it appears that the outcome is to replace the key entirely, rather than to populate recursively. Contrast with what you might expect for type A struct { X, Y int }; type B struct { A1, A2 A }. In that case, I'd expect {A1: {"X": 1}} not to overwrite A1.Y...

And it sort of makes sense to me that, since map values aren't addressable, a map[string]struct would just replace the struct with a new struct, because trying to behave otherwise is actually pretty hard. But when it's a map[string]*struct, it's not insane to think it should act like populating a struct would normally.

Seems to be the samme with map[string]map[string]int, etc., which is to say, the recursion of decoding is not the same as you'd get by recursing yourself. If you have a map key-value pair "A": ..., and the map already has a pointer in A, the behavior you get from calling decode with that is not the same as you'd get from calling decode with the ... on the pointer stored in ["A"]. Which is different from how structs behave.

mvdan commented 5 years ago

Here is what I think is a simpler example: https://play.golang.org/p/gnyc9t1kleB

It shows that a map value is replaced, and thus the original value is left untouched, and the pointers differ. Edit: it's replaced with a zero value, so all other fields are lost.

When a struct is used, the same pointer is reused, so the original value changes.

I tend to agree that both cases should behave the same here, but it's hard to tell if that's a good idea before digging into the code.

mvdan commented 5 years ago

I've convinced myself that the current behavior is simply by accident. I've worked on a fix, and no existing json tests fail. I also like the new logic much better, as it's more consistent with structs and the general package behavior.

I've also made non-pointer elements keep their existing values too, by making a copy of the value and assigning it back to its key.

gopherbot commented 5 years ago

Change https://golang.org/cl/179337 mentions this issue: encoding/json: reuse values when decoding map elements

mvdan commented 5 years ago

This has a CL ready for review, and I think there's consensus that we should do this, so I'm moving it back to the 1.14 milestone.

dsnet commented 4 years ago

FYI, this changes made for this issue are possibly going to be reverted. See #39149.

ianlancetaylor commented 4 years ago

The fix was rolled back, so reopening this issue.

mvdan commented 4 years ago

I'm not sure if there's much else to do here for the current encoding/json. As the revert issue points out, too much existing code depends on the current behavior, so we can't possibly change it at this point without breaking the compatibility promise.

I would definitely add this issue to the bag of design issues to consider in a future json API redesign, though. How should we keep track of those? I've seen some others in the form of proposals in "held" status, but I don't think this issue should be a proposal. It's just a design inconsistency bug.

stupidjohn commented 4 years ago

how about use a special function to indicate not allocate when already allocated. just like UseNumber ?

mvdan commented 4 years ago

I don't think adding options for historical design tech debt is a good idea. The problem with that kind of debt is that the API is less consistent and a bit harder to use, but one can usually still write code around the issue. If we add an option, to fix the debt and avoid breaking backwards compatibility, the package is still not consistent (by default) and, arguably, it's even harder to use for someone new to Go.