golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.79k stars 1.58k forks source link

protojson: fail to unmarshal JSON with reserved fields #1606

Closed timofurrer closed 6 months ago

timofurrer commented 7 months ago

What version of protobuf and what language are you using? Version: Go, v1.5.4

What did you do?

We have a message with a reserved field and we want to unmarshal some JSON that has this field.

message Foo {
  string bar = 1;
  reserved "should_be_ignored";
}
package main

import (
    "fmt"

    "google.golang.org/protobuf/encoding/protojson"
)

func main() {
    input := []byte("{\"bar\": \"teststring\", \"should_be_ignored\": 42}")
    foo := &Foo{};
    err := protojson.Unmarshal(input, foo)
    if err != nil {
        panic(err)
    }

    fmt.Printf("Foo: %v\n", foo)
    return
}

Running this results in an unmarsahling error like unknown field "should_be_ignored".

What did you expect to see?

The JSON unmarshals correctly, but ignores the reserved fields

What did you see instead?

The decoder fails to unmarshal complaining about not knowing the reserved field.

Anything else we should know about your project / environment?

puellanivis commented 7 months ago

At least adding a basic protobuf definition with input that repros would be helpful for understanding.

timofurrer commented 7 months ago

@puellanivis I've updated the description.

puellanivis commented 7 months ago

Hm, it works if you set protojson.UnmarshalOptions{ DiscardUnknown: true }

I suppose the question is, should reserved field names be discarded as “known”. From the text of https://protobuf.dev/programming-guides/proto3/#fieldreserved:

To make sure JSON and TextFormat instances of your message can still be parsed, also add the deleted field name to a reserved list.

This suggests that the JSON should continue to parse with the reserved field name.

timofurrer commented 7 months ago

Yes, I'm aware that I can discard unknown fields, but that would discard "all" unknown fields - which is not what I want.

This suggests that the JSON should continue to parse with the reserved field name.

I agree - so that means that protojson doesn't behave according to this, right?

puellanivis commented 7 months ago

This suggests that the JSON should continue to parse with the reserved field name.

I agree - so that means that protojson doesn't behave according to this, right?

We’ll need more input from contributors, but I think from my reading that this is a misbehavior. I’ve also advised someone based on this guidance that I suspected reserving a field name would be that it would be recognized as known but ignored, not breaking like this.

timofurrer commented 7 months ago

We’ll need more input from contributors

@puellanivis anything we (or I) can do to get that "input" ?

puellanivis commented 7 months ago

You could work on a gerrit change request yourself. I think the in-house Google devs are otherwise busy with editions support.

lfolger commented 7 months ago

Thanks for reporting this. I think this should be fixed. The behavior is documented in the spec and the go implementation is not following it. I double checked and the C++ implementation for example follows the spec.

I'll try to prepare a change for this and test is in Google first to see if anyone relies on the old behavior (I don't expect anyone to but you never know).

timofurrer commented 7 months ago

@lfolger that's good news that you'll have a look at this - it's highly appreciated, thank you 🤝

lfolger commented 7 months ago

While looking into this I found out that this is already the behavior today. Could you clarify the steps to reproduce?

These are the tests that I added and they succeeded without errors: https://go-review.googlesource.com/c/protobuf/+/581095

lfolger commented 7 months ago

I just realized this is about protojson. Let me try to add a test for that as well.

lfolger commented 7 months ago

Unfortunately, the C++ implementation has the same issue. Usually the C++ implementation is right and diverging from it means in most cases diverging from the spec. I reached out to the core protobuf team to clarify. I'll report back once I know more.

Sorry for the back and forth.

lfolger commented 7 months ago

Short update. There is still an ongoing discussion internally about what the behavior should be. I will come back to this once I know more.

lfolger commented 6 months ago

AFter discussion it turned out that the current behavior is a "bug" in the Text proto parser and intended behavior in the JSON format. I call this "bug" because the implementation is more permissive that it has to be which most users don't mind.

A documentation update to the official documentation will follow shortly.

If you like the behavior to be changed, you would need to file a feature request against the protobuf.

Just to manage expexctations, I'm not very hopeful that they will accept this but you should give it a try nonetheless. Maybe there will be enough users and use cases that need this and they will accept it. If that happens, we are happy to add this to the Go implementation.

I'm closing this issue for now but feel free to reopen it if you have more questions.

ash2k commented 6 months ago

Hi all! I've been following this issue. I don't think this has been mentioned, I'd like to point out an inconsistency between binary proto unmarshaling and JSON->proto message unmarshling. Reserved fields are ignored in the binary format and you don't get an error. I think this makes sense. I wonder why is it not the same for any other format that can be unmarshaled into a proto message?

If it is how it is and the behavior is not going to change, then:

ash2k commented 6 months ago

Yet another way to think about it: currently reserved is a no-op for JSON unmarshaling. To me it is a signal that something is not right - either the spec defines something useless (if we forget about the binary format, with which JSON is not consistent) or the implementation doesn't follow the spec.

p.s. Would be great to hear some of the counter-arguments from the internal discussion. To be honest, it's a bit annoying that this is being discussed behind closed doors. There are lots and lots of users of the format and libraries and there is an open issue tracker and this project is open source in the first place. Why all that if not for transparency and collaboration? So, I'm sure a bit more transparency would be greatly appreciated by the community. Thank you.

puellanivis commented 6 months ago

The semantics of it has pretty much always been documentation of fields to not re-use because they’ve been removed. I believe it is an error during the compilation if a field attempts to reuse either a field number or field name, so it is not strictly a fully no-op.

timofurrer commented 6 months ago

The spec literally mentions that reserved can be used so that JSON can still be parsed (from last sentence in second paragraph here):

If you update a message type by entirely deleting a field, or commenting it out, future developers can reuse the field number when making their own updates to the type. This can cause severe issues, as described in Consequences of Reusing Field Numbers.

To make sure this doesn’t happen, add your deleted field number to the reserved list. To make sure JSON and TextFormat instances of your message can still be parsed, also add the deleted field name to a reserved list.

I agree with @ash2k here that it's kinda frustrating to fix a bug by changing the spec.

Assuming there is NO way that this gets fixed (which would be really sad), would you be open to add an opt-in config to the json marshaller to enable this behavior? .. and make json parsing align to the behavior to binary parsing?

lfolger commented 6 months ago

"Text proto parser has a bug" - perhaps this issue can be used to fix it

First of all, when I said the Go implementation has a bug in the text parser, I meant that it accepts messages with unkown but reserved fields. To be clear only C++ and Go do this. All other implementation we know don't do this. This means JSON and text parsers behave the same (except for C++/Go).

Would be great to hear some of the counter-arguments from the internal discussion. To be honest, it's a bit annoying that this is being discussed behind closed doors.

About the discussion not being public, I was not even part of that discussion. I think if you want a public discussion, you need to file a feature request against the protobuf main repository which handles changes to the spec. It was partially me fault in that I reached out to them internally rather than just asking you to file a bug against the proto repo directly. Sorry about that.

I'd like to point out an inconsistency between binary proto unmarshaling and JSON->proto message unmarshling. Reserved fields are ignored in the binary format and you don't get an error

The accepting of unknown field in the binary format works very different. It accepts all unknown fields. It is completely independent of the reserved fields. You can get the same behavior by using the appropriate unmarshaloptions for protojson and prototext.

Again, there is no point in discussion this here. We, the Go protobuf maintainers, are not responsible for the spec. If you would like to see the spec changed, file a feature request against the proto buf repository. All we can do is to follow the spec.