golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.64k stars 1.58k forks source link

Add option to JSON marshal / unmarshal: StripEnumPrefix / AddEnumPrefix #1598

Open dorner opened 3 months ago

dorner commented 3 months ago

Is your feature request related to a problem? Please describe.
Currently, all enums are prefixed in order to avoid name collisions:

enum PurchaseMethod {
  PURCHASE_METHOD_UNSPECIFIED = 0;
  PURCHASE_METHOD_ONLINE = 1;
  PURCHASE_METHOD_IN_STORE = 2;
}

When using a project like grpc-gateway and using string enums, this results in additional bytes being sent around when the messages are translated to JSON, as well as "stuttering" since you have the messages littered with patterns like purchaseMethod: "PURCHASE_METHOD_IN_STORE".

Describe the solution you'd like
An option to protojson encoding that will strip the prefix from the enum, and to decoding that will add the prefix back. In this case we can calculate the prefix as the smallest string that is common to all enum value names.

Describe alternatives you've considered
I can create my own marshaller for grpc-gateway, but the only way to do that right now is to copy/paste the majority of the code. Since this isn't really related to gRPC but to Protobuf JSON marshalling, this seems like a better place to implement it long-term.

I'm willing to give it a stab as a PR, but wanted to make sure the approach made sense.

dorner commented 3 months ago

...I thought protojson was in this project, but it might not be? Should it go here? https://github.com/protocolbuffers/protobuf-go/

Looks like this is the issue tracker for that project I guess?

puellanivis commented 3 months ago

This is the Issue board for both github.com/golang/proto but also google.golang.org/protobuf as the people maintaining both are the same.

This seems to be the same issue as https://github.com/golang/protobuf/issues/513 which is still open.

dorner commented 3 months ago

513 would definitely solve the underlying problem entirely. But given that it's been sitting for around six years without any progress, and solving it would result in a larger change (such as generated Go code), I was hoping to use this to bypass the meatiness and limit it just to marshalling and unmarshalling JSON rather than changing the Protobuf semantics / limitations.

neild commented 3 months ago

513 is about the generated code. If I follow, this is about the JSON serialization.

The protojson package implements the standard proto/JSON mapping defined at: https://protobuf.dev/programming-guides/proto3/#json

Any changes to that mapping need to be coordinated across all protobuf implementations. We do not want the Go implementation to generate JSON that can't be consumed by C++, Java, etc. A primary purpose of protobufs is to enable cross-system transfer of data, and implementation-specific serializations work directly against that goal.

The place to raise cross-implementation issues is https://github.com/protocolbuffers/protobuf/issues, but I doubt a change like this would happen; it requires too much coordination for too little benefit. The usual way to decrease the wire size of JSON data is to compress it. (Or, of course, you can use the protobuf wire format, which can generally represent a small enum field value as ~3 bytes.)

dorner commented 3 months ago

Ah interesting - didn't know this was a general standard. Thanks for the link.

Obviously if we had the option of using gRPC we wouldn't need to use Gateway :) Not everything can talk gRPC. Generally speaking, if I unmarshal my Protobuf file using this option, the reader isn't going to then marshal it back into Protobuf - it most likely can only speak JSON.

I'm OK if this is unlikely to happen - like I said, I can focus more on the gateway side and look at the marshaller there. I can try getting it to work and see if it makes sense to contribute as an option specifically on the grpc-gateway project. Does that seem more likely to work?

puellanivis commented 3 months ago

OH, I get what you’re saying now. Sorry for misunderstanding you at first.

Yeah, we run into language interop problems then, as mentioned.

dsnet commented 3 months ago

A little bit of history:

Note that #513 is related problem, but one that's specific to just the generated Go code. The generated behavior of having enum values be prefixed with the enum type came about because the convention of always manually prepending enum values with the enum type name had not yet become common practice when the Go generator was first written. Ironically, not having enums being in a unique namespace under the type itself has led to two layers of pragmatic solutions that interacts poorly with each other:

dorner commented 3 months ago

Yeah. I worry less about the generated code because autocomplete handles that for you - it's not pretty, but it's not a pain. But on the JSON side, especially when interacting with projects that aren't in Go and don't understand Protobuf, it looks really weird.

Thanks very much for the detailed history lesson - I'd known some of it but not all. Appreciate your taking the time.

dorner commented 3 months ago

Oof... looks like grpc-gateway just flat out proxies to protojson. So I wouldn't be able to achieve this without forking the project, or doing a second, fairly in-depth pass to tease out the enums and values after the fact. :(

It might take a while, but I think adding an option to the protojson spec to change how it should marshal/unmarshal this might be worth doing. I probably have no idea what I'm getting myself into though. :)

puellanivis commented 3 months ago

The first step would probably be opening up an issue on the main protobuf issue board, leading at least some discussion.