golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.78k stars 1.58k forks source link

internal/impl: implement support for non-pointer message fields #1142

Closed ydnar closed 4 years ago

ydnar commented 4 years ago

We’re in the process of migrating from github.com/golang/protobuf to the v2 API (google.golang.org/protobuf) and hit a panic when using proto.Equal to compare protos.

We traced the issue to a use of (reflect.Value).IsNil() here: https://github.com/protocolbuffers/protobuf-go/blob/1f5b6fe64530cac2061a3d315b7e44966b1a200b/internal/impl/message_reflect_field.go#L386-L392

(reflect.Value).IsNil() panics when the underlying value is not a pointer type (e.g. struct pointer, channel, slice, map, etc.).

It looks like v2’s internal API assumes that all non-scalar values are pointers, which in this case is not true—we’re using GoGo Protobuf’s feature of non-nullable fields in most of our messages, primarily time.Time values and occasional struct value where we don’t want the additional GC load.

We forked this and replaced the IsNil call with IsZero, which in this use case works correctly. Suggestion: swap out calls to IsNil with calls to IsZero to prevent this runtime panic and work correctly with legacy v1 protos.

Additionally, it looks like this package reimplements IsZero to some extent here: https://github.com/protocolbuffers/protobuf-go/blob/1f5b6fe64530cac2061a3d315b7e44966b1a200b/internal/descfmt/stringer.go#L224-L243

Thanks!

dsnet commented 4 years ago

This project only provides backwards compatibility for messages generated by protoc-gen-go. Any semblance of things "working" with older versions of this module and messages not generated by protoc-gen-go is purely accidental and our compatibility policy does not require us to keep such undefined behavior working.

we’re using GoGo Protobuf’s feature of non-nullable fields in most of our messages, primarily time.Time values and occasional struct value where we don’t want the additional GC load.

This project does not have support for the "non-nullable" feature in gogo/protobuf and we do not intend to support it (it is not as trivial as simply changing IsNil to IsZero; that change may work for your limited use-case, but there a number of other edge-cases to address).

On the user end, the problem should be addressed by calling gogo/protobuf''s equivalent functionality which knows how to handle the variations in generated API that their generator produces (it is not feasible for golang/protobuf to support every feature that gogo/protobuf provides).

In the distant future, if the maintainers of gogo/protobuf implement protobuf reflection support for their generated messages, then all of it would work properly with the APIs provided by this module.

ydnar commented 4 years ago

We’re currently using a number of Google Cloud APIs, which are migrating to the new v2 API, and currently have a hack in our go.mod file to force the old Protobuf module:

replace github.com/golang/protobuf => github.com/golang/protobuf v1.3.5

The fact that this module (github.com/golang/protobuf) didn’t adopt the Go module v2 package path means we can’t have both the old and the new (v2) versions in our service. This means we have to choose between:

  1. Rewrite everything that uses GoGo Protobuf
  2. Continue using older versions of Google API clients, and accept the drift/breakage that comes with it
  3. Move infrastructure from GCP to AWS
  4. Abandon protos/gRPC entirely

This project does have support for the "non-nullable" feature in gogo/protobuf and we do not intend to support it (it is not as trivial as simply changing IsNil to IsZero; that change may work for your limited use-case, but there a number of other edge-cases to address).

That’s actually…not true. The panic here proves that.

Option 1 above would be significantly easier if the Go protobuf library added support for these features, most of which are already present in other languages’ protobuf libraries:

ydnar commented 4 years ago

In the distant future, if the maintainers of gogo/protobuf implement protobuf reflection support for their generated messages, than all of it would work properly with the APIs provided by this module.

In this case, distant == indefinite. The maintainers of GoGo Protobuf are no longer able to maintain the project: https://github.com/gogo/protobuf/issues/691

dsnet commented 4 years ago

That’s actually…not true. The panic here proves that.

It was a typo. I meant to say it does not have support.

dsnet commented 4 years ago

if the Go protobuf library added support for these features ...

It is unlikely that we provide support for those features. See https://developers.google.com/protocol-buffers/docs/reference/go/faq#new-feature.

Philosophically, this project prioritizes adherence to the wider protobuf ecosystem and deliberately does not treat Go as a special snowflake that gets to have many special features that the other language implementation does not have.

I understand this stance is contrary to what some Go users would prefer, but that is how it is.

cee-dub commented 4 years ago

many special features

Fair enough. However, #52 and #555 seem like a bare minimum of interacting well in the Go ecosystem, not "special" features.

Please reconsider your stance here. It appears extremely hard-line and closed-minded to state that the project philosophically disagrees with adding anything specific to Go.

dsnet commented 4 years ago

Both #52 and #555 have been discussed multiple times over the years with the protobuf project owners and the answer has consistently been no.

Please reconsider your stance here.

This is not something this project has the power to decide. Sorry.

ydnar commented 4 years ago

I would strongly encourage you reconsider your choice of language (“special snowflake”) right now.

It does more harm than good, and actively discourages folks from wanting to contribute, much less participate here.

dsnet commented 4 years ago

It does more harm than good, and actively discourages folks from wanting to contribute, much less participate here.

I understand, but there are other factors at play. If protobufs were a technology invented by the Go project, we would have more freedom in choosing our stance on things. However, that is not the case. Go protobufs happens to be implemented by members of members of the Go team, but the overall direction of the project fundamentally is determined by the primary protobuf project. No amount of convincing will change this since it is not something we have power to decide.

cee-dub commented 4 years ago

I understand you feel you have no power to change the decision of the protobuf team. I think you missed the point about your language (not Go) choices in the discussion above.

dsnet commented 4 years ago

I think you missed the point about your language (not Go) choices in the discussion above.

I don't really understand what this sentence means.

ydnar commented 4 years ago

I understand, but there are other factors at play. If protobufs were a technology invented by the Go project, we would have more freedom in choosing our stance on things. However, that is not the case. Go protobufs happens to be implemented by members of members of the Go team, but the overall direction of the project fundamentally is determined by the primary protobuf project. No amount of convincing will change this since it is not something we have power to decide.

One thing, however, is entirely within your power: add hooks to protoc-gen-go to allow folks (like us, like the GoGo Protobuf contributors), to intercept and modify the output before being written.

I’ve already forked protoc-gen-go in another project to generate golint valid output.

The argument put forth here is specious. Language-specific implementation details have nothing to do with the protocol interop, but rather conforming to the language ecosystem a developer chooses to work in.

By all means, don’t add go_tags or go_name to the core Google protobuf types. At least don’t prevent others from easily doing so.

cee-dub commented 4 years ago

I think you missed the point about your language (not Go) choices in the discussion above.

I don't really understand what this sentence means.

Didn't think I'd have to point out why, but OK. The term is, unfortunately, now politicized and often used in derogatory sense, which seems inconsistent with the Go code of conduct.

dsnet commented 4 years ago

The argument put forth here is specious. Language-specific implementation details have nothing to do with the protocol interop, but rather conforming to the language ecosystem a developer chooses to work in.

That is not the stance of the protobuf project. They care about consistency even in the generated API that most target languages use. You are free to disagree.

I should also note that features like #52 and #555 require something to be specified in the .proto file, which is pretty clearly in the territory that the protobuf project cares about.

add hooks to protoc-gen-go to allow folks (like us, like the GoGo Protobuf contributors)

Two thoughts:

https://www.urbandictionary.com/define.php?term=Special%20Snowflake

Thanks for pointing that out. I was not aware of the etymology of the term (hence my confusion about what your sentence meant) which are very unfortunate and sad. My intention with the phrase is to simply indicate something "unique and specialized" (not in a derogatory sense) based on the popular science fact that every snowflake is unique.

ydnar commented 4 years ago

That is not the stance of the protobuf project. They care about consistency even in the generated API that most target languages use. You are free to disagree.

As a contributor to Swift Protobuf, I will happily disagree. This implementation has several ergonomic features that improve its usability by doing more than the least-common-denominator API.

add hooks to protoc-gen-go to allow folks (like us, like the GoGo Protobuf contributors)

Two thoughts:

  • Hooks eventually become a major maintenance problem since it means that the internal implementation of protoc-gen-go cannot easily change without potentially breaking the hook users.
  • While it isn't exactly a "hook", we provided the compiler/protogen package which aids building your own protoc plugins in Go. If you want to customize the API, then we've provided the foundations for building your own generator.

Can you promote the quasi-internal google.golang.org/protobuf/cmd/protoc-gen-go/internal_gengo package?

dsnet commented 4 years ago

Can you promote the quasi-internal google.golang.org/protobuf/cmd/protoc-gen-go/internal_gengo package?

Can you explain how that would help users implement things like #52, #555, and "generate golint valid output"?

dsnet commented 4 years ago

This implementation has several ergonomic features that improve its usability by doing more than the least-common-denominator API.

Examples? A glance over descriptor.proto only shows the swift_prefix option.

ydnar commented 4 years ago

Can you promote the quasi-internal google.golang.org/protobuf/cmd/protoc-gen-go/internal_gengo package?

Can you explain how that would help users implement things like #52, #555, and "generate golint valid output"?

Here’s a quick example that cribs some internal code from golint to generate idiomatic Go identifiers (e.g. XMLRequest instead of XmlRequest): https://gist.github.com/ydnar/327d33e4c9fa54f08b6d7f0e31880bf7

The body of the file is copied from the main protoc-gen-go and modifies the output prior to writing to the filesystem.

Adding a go_name (rather, go.name) or go.tags would be via an extension. This fork of protoc-gen-go could honor those struct tags, but we’d need a hook in the serialization step to write the struct tags.

I’m trying to avoid folks needing to fork this repo to get egonomic Go output from protoc-gen-go, which is what GoGo Protobuf is. I’m advocating for an API that lets others modify the output of protoc-gen-go, even if that API is subject to no compatibility promise.

dsnet commented 4 years ago

The approach taken in https://gist.github.com/ydnar/327d33e4c9fa54f08b6d7f0e31880bf7 seems functionally equivalent to hooks in that it imposes maintainability problems down the line. The protogen data structures weren't intended to be mutated as they were passed internal_gengo as doing so may break invariants that internal_gengo is assuming relative to what protogen normally provides.

Also, I think an example that implements #52 would have been more compelling since golint has just recently been marked as deprecated (golang/go#38968), so modifying generated output for golint purposes seems like a misplaced effort. Even if you could accomplish #52 with internal_gengo, I still don't think it changes my primary argument that it needs to make assumptions on how the internal operations of the generator works, and thus imposing significant maintainability costs.

ydnar commented 4 years ago

This implementation has several ergonomic features that improve its usability by doing more than the least-common-denominator API.

Examples? A glance over descriptor.proto only shows the swift_prefix option.

ydnar commented 4 years ago

The approach taken in https://gist.github.com/ydnar/327d33e4c9fa54f08b6d7f0e31880bf7 seems functionally equivalent to hooks in that it imposes maintainability problems down the line. The protogen data structures weren't intended to be mutated as they were passed internal_gengo as doing so may break invariants that internal_gengo is assuming relative to what protogen normally provides.

Also, I think an example that implements #52 would have been more compelling since golint has just recently been marked as deprecated (golang/go#38968), so modifying generated output for golint purposes seems like a misplaced effort. Even if you could accomplish #52 with internal_gengo, I still don't think it changes my primary argument that it needs to make assumptions on how the internal operations of the generator works, and thus imposing significant maintainability costs.

Heck, just fixing the GoName generation would be helpful. The identifiers protobuf-go generates are terrible.

But to your core argument of “maintainability costs:” the choice of this team to not use the Go module v2 package path for the breaking changes has externalized those costs on every downstream user of this library.

cee-dub commented 4 years ago

Also, I think an example that implements #52 would have been more compelling

Let's think this through. Users want to avoid forking protoc-gen-go so we're requesting a feature in protoc-gen-go, which I'm led to believe is within this team's control. We understand that first-class options or extensions to the protobuf language itself are out of scope.

Above, @ydnar's gist forks protoc-gen-go to modify the generated file before writing because, at present, there's no other mechanism available to affect the generated output.

Consider promoting at least portions of google.golang.org/protobuf/cmd/protoc-gen-go/internal_gengo such that Go users needing to add struct tags #52, or change Go identifiers #555 could do so in a way that the rest of the machinery is able to just use.

Take as an example flag.Usage, an exported func-typed var that's initially set to a reasonable implementation. If the same mechanism was used to expose a func analogous to fieldJSONTagValue(field *protogen.Field) string to add struct tags, where the default implementation returns "", users could create their own protoc-gen-* binary that was a minimal wrapper around protoc-gen-go by importing the package and setting the func to inspect extensions that declare extra struct tags, e.g.: string with_tag = 1 [(go.addstructtags) = 'my_tag:"tag_value,opt"'];.

I'll stop here before trying to construct an example for #555 or writing code for the above suggestion to get your response.

cee-dub commented 4 years ago

And just for clarity, affect the generated output is necessary to allow for implementation details (e.g. struct tags or Go Identifiers) that cannot be changed by simply adding another file, as protoc-gen-go-grpc can do.

neild commented 4 years ago

This issue has ranged around a bit, but as I understand it, the feature request is to support generated messages produced by the gogoprotobuf fork of the code generator.

The compatibility guarantee for the Go protobuf implementation states:

Users should use generated code produced by a version of protoc-gen-go that is identical to the runtime version provided by the protobuf module. This project promises that the runtime remains compatible with code produced by a version of the generator that is no older than 1 year from the version of the runtime used, according to the release dates of the minor version. Generated code is expected to use a runtime version that is at least as new as the generator used to produce it. Generated code contains references to protoimpl.EnforceVersion to statically ensure that the generated code and runtime do not drift sufficiently far apart.

While we only guarantee support for generated code produced by a protoc-gen-go up to one year old, in practice we are significantly more permissive. (We currently test interoperability with code produced by a four-year-old generator.) Maintaining support for older generated code comes at a cost, but we feel that it is important to do so.

Unfortunately, we are unable to provide support for generated code not produced by a version of protoc-gen-go released by this project.

We do support any type that implements the protoreflect.Message interface. If gogoprotobuf's generator is updated to produce messages that implement that interface, they will work with all public functions in the "google.golang.org/protobuf" module.

The gogoprotobuf project includes forks of both protoc-gen-go and the proto package. I believe you should be able to use generated code produced by that project's code generator with its proto package.

I realize this isn't going to make gogoprotobuf users happy, but we simply don't have the ability to provide any reasonable level of support for non-standard generated code.

cee-dub commented 4 years ago

I agree that this issue has meandered, but the current title internal/impl: implement support for nullable messages represents a valuable feature on its own and the discussion points out several other valid issues.

I realize this isn't going to make gogoprotobuf users happy, but we simply don't have the ability to provide any reasonable level of support for non-standard generated code.

Completely understand this stance. In fact, I don't believe those discussing this issue are requesting compatibility with gogoprotobuf, rather they are asking for this generator to support several specific ergonomic features that cannot be implemented by creating a supplementary generator (a la protoc-gen-go-grpc). Here, we are also not asking for additional top-level Protocol Buffers (the language) support, though that is, unfortunately, the wording of the proposals in #52 and #555.

Speaking for myself, I would be happy to no longer be a gogoprotobuf user if certain ergonomic needs were able to be met by using google.golang.org/protobuf instead.

Regarding other topics covered in the discussion here, perhaps new issues should be created to address the unacceptable parts of #52 and #555 and offer modified proposals that don't introduce changes to descriptor.proto, or those issues should be updated to address two mutually exclusive constraints:

  1. "Users should use generated code produced by a version of protoc-gen-go that is identical to the runtime version provided by the protobuf module."
  2. "Unfortunately, we are unable to provide support for generated code not produced by a version of protoc-gen-go released by this project."
ydnar commented 4 years ago

To further clarify what we’re asking for:

We want to be able to implement a subset of what GoGo Protobuf provides without having to fork the entire protobuf-go repo and protoc-gen-go.

The protogen package is nice, but the serialization parts are semi-internal, so any protoc plugin that uses it will end up having to reimplement or copy/paste huge chunks of protoc-gen-go[-grpc].

neild commented 4 years ago

I agree that this issue has meandered, but the current title internal/impl: implement support for nullable messages represents a valuable feature on its own and the discussion points out several other valid issues.

The OP of this issue is about supporting generated messages produced by gogoproto's fork of the code generator. I think that rather than trying to repurpose this issue, it'll be clearer to create new ones for any specific generator feature requests.

For the specific case of representing proto2 optional fields as a non-pointer type, I should note that any approach which loses field presence information is a non-starter. Generated messages must represent the full range of values of the protobuf message type they represent, and unmarshaling/remarshaling a message must not lose information.

ydnar commented 4 years ago

For the specific case of representing proto2 optional fields as a non-pointer type, I should note that any approach which loses field presence information is a non-starter. Generated messages must represent the full range of values of the protobuf message type they represent, and unmarshaling/remarshaling a message must not lose information.

The original request was to remove a panic stemming from an assumption that a non-basic typed field is a pointer. The message in question is a proto3, not proto2, and this was for an internal API where the value was represented as a time.Time, not a WKT timestamp, and the zero value is acceptable.

The Go reflect package provides a function for this (IsZero) so this package (or google.golang.org/protobuf) doesn’t need to reimplement it using IsNil.

Given that idiomatic Go suggests that the zero value of a struct should be useful where possible, and that Go is garbage-collected, reducing the number of pointers (and allocs) is a reasonable goal. Especially if the use case doesn’t need to discriminate between the (non-)presence of a field and the zero value.

neild commented 4 years ago

The message in question is a proto3, not proto2, and this was for an internal API where the value was represented as a time.Time, not a WKT timestamp, and the zero value is acceptable.

I see. The protobuf code generator has never created time.Time field values, so this is about supporting code generated by (I presume) the gogo fork of the generator.

Unfortunately, as I said above, we are unable to provide support for generated code not produced by a version of protoc-gen-go released by this project.

ydnar commented 4 years ago

Unfortunately, as I said above, we are unable to provide support for generated code not produced by a version of protoc-gen-go released by this project.

Okay—how about a hook in protogen so the tree can be modified before serialization? That’d at least provide a mechanism for others to implement #52 and #555.

Ultimately, if your answer is just “fork the repo if you want those features,” then just say so, close #52 and #555, and get rid of the TODO in protogen.

dsnet commented 4 years ago

Ultimately, if your answer is just “fork the repo if you want those features,” then just say so,

We are heavily leaning towards that option. There's not really a good way to expose hooks in a way that will not be regularly broken by changes to the internal implementation of the generator.

close #52 and #555, and

I should note that I was the one who proposed #555, and I personally have some interest in seeing it provided. However, I also understand the reasons not to provide it. We're not holding back on features to be rude or deliberately cause users pain. Engineering is about trade-offs. Every feature has benefits and costs; they must be evaluated.

get rid of the TODO in protogen.

If we decide that we definitely will not do #52 and #555, we will remove that TODO.

cee-dub commented 4 years ago

I note that the first question on #555 that appears not to have been answered is from @neild:

Open question: Can we add a go_name field to FieldDescriptor et al., or does this need to be an extension option?

Are extension options more generally palatable? We've established that the protobuf team are not interested in additions to FieldDescriptor for language-specific features beyond those absolutely necessary.

dsnet commented 4 years ago

Are extension options more generally palatable?

In some ways yes, and in other ways no. Consider the following .proto source file:

message MyMessage {
    // With options declared in descriptor.proto.
    optional int32 my_field1 [cpp_name = "custom_name", java_name = "custom_name", ruby_name = "custom_name", go_name = "custom_name", python_name = "custom_name"];

    // With options declared as extensions to descriptor.proto.
    optional int32 my_field2 [(cpp.name) = "custom_name", (java.name) = "custom_name", (ruby.name) = "custom_name", (go.name) = "custom_name", (python.name) = "custom_name"];
}

The concern about allowing name overrides is the proliferation of similar options in all languages and the readability detriments it brings to the .proto source files. It seems that the same concern occurs regardless of whether the option is a declared option in descriptor.proto or one achieved through extensions.

ydnar commented 4 years ago

The concern about allowing name overrides is the proliferation of similar options in all languages and the readability detriments it brings to the .proto source files. It seems that the same concern occurs regardless of whether the option is a declared option in descriptor.proto or one achieved through extensions.

I think the usability costs for engineers working with generated types outweighs minor complexity in the proto files.

The lack of go_tags alone requires any user of this library to build parallel adapter structs if struct tags are needed.

cee-dub commented 4 years ago

Generating better names will in many cases eliminate the need to use an option, if one were available, to change the struct field name. That would reduce the problem of .proto file readability.

neild commented 4 years ago

Closing: The initial issue here is about supporting generated code not produced by protoc-gen-go. Unfortunately, as discussed above, we are only able to provide support for generated code produced by a version of protoc-gen-go released by this project.