golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.7k stars 1.58k forks source link

reflect/protodesc: some descriptors cannot be round-tripped to protos and back #1197

Open jhump opened 3 years ago

jhump commented 3 years ago

What version of protobuf and what language are you using? v1.25.0

What did you do? Example proto file tmp.proto

syntax = "proto2";

option go_package = "main";

message Foo {
  option message_set_wire_format = true;

  extensions 100 to max;
}

We can successfully generate Go code for this:

protoc --go_out=. tmp.proto

And the resulting code can successfully retrieve the descriptor:

package main

import "github.com/golang/protobuf/proto"
import "fmt"

func main() {
  r := proto.MessageReflect(&Foo{})
  fmt.Println(r.Descriptor().FullName())
}

However, if we try to convert the descriptor to a proto and then create a descriptor from that (which would also happen if, for example, we loaded the descriptor from a protoset produced by protoc or from a gRPC server via server reflection), we get an error:

package main

import "github.com/golang/protobuf/proto"
import "google.golang.org/protobuf/reflect/protodesc"
import "google.golang.org/protobuf/reflect/protoregistry"
import "fmt"

func main() {
  r := proto.MessageReflect(&Foo{})
  fmt.Println(r.Descriptor().FullName())
  pr := protodesc.ToFileDescriptorProto(r.Descriptor().ParentFile())
  var reg protoregistry.Files
  fd, err := protodesc.NewFile(pr, &reg)
  if err != nil {
    panic(err)
  }
  fmt.Println(fd.Path())
}

What did you expect to see? The second, longer program succeeds and prints "tmp.proto".

What did you see instead? Running the new program actually fails:

panic: proto: message "Foo" is a MessageSet, which is a legacy proto1 feature that is no longer supported

I can understand an error like that if we tried to use such a message, to serialize or de-serialize. But it seems really weird that we can't even create the descriptor, especially since we can successfully create the descriptor if it is packaged in the protoc-gen-go-generated code.

dsnet commented 3 years ago

Is there any real code outside Google that actually uses MessageSet? It's an artifact of proto1 and it's a source of significant technical debt. In open-source, we ideally should treat it as if it doesn't exist.

jhump commented 3 years ago

I received several bugs against protoparse related to handling of message set wire format. So someone is using it:

jhump commented 3 years ago

FWIW, I believe it is possible (likely even?) that teams inside of Google use protoparse.

Also, as mentioned in the description, this is strangely inconsistent. I can generate such code to Go using protoc-gen-go and load the descriptor just fine.

dsnet commented 3 years ago

Also, as mentioned in the description, this is strangely inconsistent. I can generate such code to Go using protoc-gen-go and load the descriptor just fine.

I agree it's inconsistent, and I'm inclined to make protoc-gen-go not be able to generate MessageSet code in open-source.

I should note that legacy features like MessageSets and weak fields are guarded by the protolegacy build tag. We have an implementation for them, but there's no guarantee that we'll keep them around once all the technical debt relying on them is cleared out.

jhump commented 3 years ago

I have a repo that attempts to be a pure Go replacement for protoc. It only deals with descriptors. It would be a rather huge shame if I were unable to use the Go protobuf runtime because of these decisions.

In my case, the code I care about is just dealing with descriptors -- not other aspects of runtime behavior. Not even being able to create a descriptor seems incredibly draconian.

I actually don't have a problem if protoc-gen-go refused to generate code for the MessageSet, but I think it's a reasonable request that I at least be able to construct and inspect such a descriptor. After all, it's a valid descriptor; why would the runtime not allow me to create it?

Like I originally said, I think the errors belong elsewhere -- like actually trying to marshal/unmarshal such a message.

As I mentioned above, I had to make changes to my own protoreflect for these to work, because there is some need in the community. It's not like I'm asking for any extra surface area or new API, so the actual carrying cost of such a change would be de minimis.

jhump commented 3 years ago

I should note that legacy features like MessageSets and weak fields are guarded by the protolegacy build tag.

Are you suggesting that I should just build with this tag, in lieu of any other kind of change being made?

neild commented 3 years ago

I think it's reasonable to say that protodesc should accept any valid descriptors, including ones where MessageOptions.message_set_wire_format is set. It doesn't need to do anything with the value aside from preserving the contents of the MessageOptions message.

Trying to marshal/unmarshal such a message would still produce an error, of course.

jhump commented 3 years ago

@neild, FWIW, I took a look at things guarded by flags.ProtoLegacy, and marshaling/unmarshaling is already guarded. So I think a useful change here would be to just remove this one check. There is one other flag check in that file for weak fields, too.