golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.74k stars 1.58k forks source link

Editions language feature: (pb.go).struct_tag #1619

Closed bmon closed 2 months ago

bmon commented 4 months ago

This request relates to the previous discussion on https://github.com/golang/protobuf/issues/52

Some of the software I maintain would significantly benefit from being able to set custom struct tags for some proto message types when compiled to go. Similar to the many voices in #52, we dream of being able to do this with protoc-gen-go. I noted one of the problems this feature posed was that the proto field meta would only be meaningful only for generated go code.

Lately I found out about protobuf editions and saw that there is explicit support for language-specific features. For example the string_type option for c++. https://protobuf.dev/editions/features/#string_type

I was wondering if the advent of this support for language specific features could possibly allow for a reconsideration of support for setting golang struct tags in protobuf files?

Describe the solution you'd like Perhaps something like

edition = "2023";

import "google/protobuf/go_features.proto";

message Foo {
  string bar = 6;
  string baz = 7 [features.(pb.go).tags = 'my_tag_name:"my_tag_value"'];
}
lfolger commented 3 months ago

Could you explain your use case in more detail?

bmon commented 3 months ago

Sure. I have two use-cases:

We have a gRPC interceptor which will automatically log request content under certain conditions (request failure, latency, etc). We very much like this approach because it's on by default for all our applications (a developer can't forget logging), and it saves us time. However, it's crucial for us that PII or otherwise sensitive information does not get logged. We use struct-tags added to the generated request messages to flag specific fields for masking or to be omitted entirely from logging.

For example, for a given method and request message:

service UserService {
    rpc ProvisionNewUser(ProvisionUserRequest) returns (User);
}

message ProvisionUserRequest {
    string email = 1 [features.(pb.go).tags = 'log:"mask=email"'];
    string org_id = 2;
    string name = 3 [features.(pb.go).tags = 'log:"mask=all"'];
    string password = 5 [features.(pb.go).tags = 'log:"-"'];
}

May produce a log looking something like:

{"msg":"UNKNOWN error while handling request", "grpc.request.content":{"email":"f**@b**.com", "org_id": "1234-5678", "name": "**********"}}

We have another, use case which is pretty similar, but stems from an internal auditing component. The system will automatically parse and record request data along with other metadata about the request, in order to produce a searchable audit-trail of actions users take while using our system.

Similar to the way I described the logging interceptor, it is on by default, and is intended to be highly automated so it is reliable but also does not add extra development burden. It has similar PII concerns to the example above, but also has additional requirements regarding flagging particular fields as identifiers, so different requests which are relevant to each other can be associated.


For both of the examples I gave above there isn't really a way to achieve the developer experience we want without changing the generated code. Our current best solution is to generate the protos, then have another application manipulate the generated code to add the struct tags.

I think it's worth noting that these are just the use-cases I have - and there are many, many more that others have raised in #52 including simplifing data access layers (db/sqlx/datastore tags), implementing custom request validation, and more.

lfolger commented 3 months ago

For both of the examples I gave above there isn't really a way to achieve the developer experience we want without changing the generated code.

I don't think is true. You can use custom field/message option for this. This also allows you to annotate your proto once and extract the information in all languages not only Go and you can be sure that the information is consistent across languages. Please see https://protobuf.dev/programming-guides/proto3/#options and let me know if this works for you.

bmon commented 3 months ago

Thanks for the reply. I had a stab at implementing what you described, but I ran into some issues. We implement a wrapper around a logging implementation to support the masking, and I guess it would be possible to change our logger implementation to both scan for struct tags and proto reflect. However it had the consequences of our logger now having to type assert everything we log in order to attempt to protoreflect, in addition to scanning struct tags.

Unfortunately I can't think of a way to get the solution you described working for external packages that I don't implement, for example sqlx's db tags and cloud datastore's datastore tags.

Do you know if there's some way to workaround these issues? RE: the original request, would it be possible or appropriate for protoc-gen-go to add a language feature like the ability to set struct tags? If the maintainers are at all open to the idea I would be happy to try and implement it.

lfolger commented 3 months ago

In general we moved away from struct tags in favor of protoreflect API. In the deprecated protobuf module we used struct tags to build limited reflection mechanisms. However, this turned out to be insufficient (complex options cannot be represented in struct tags) which is why we moved to the protoreflect API which we have now.

I think if you have other third-party packages that operate on your data, you should either make sure they understand protocol buffers or use an abstraction that you control (e.g. a struct type which you convert your generated protobuf type from and to). The problem is if you don't have an abstraction and these third-party libraries don't understand protobuf, we would need to add more and more features to protobuf to generated struct that are understood by these third-party tools. Today this might only be struct tags in the future these tools might require additional fields or additional interfaces to implement. This is not sustainable.

I want to emphasis that protocol buffers are a language interoperability tools and language specific features should be used sparingly or they might conflict with features that are added to the protobuf spec in the future.

In other words, for now the maintainers are not open to accept such a feature request.

bmon commented 3 months ago

In general we moved away from struct tags in favor of protoreflect API.

Thanks for elaborating on this. I was wondering if it would have been possible to implement the reflection via struct tags instead of methods, but I suspected the problems you described may have cropped up.

Today this might only be struct tags in the future these tools might require additional fields or additional interfaces to implement.

I think we could probably argue that struct tags in go are quite stable and cover almost all use-cases. I understand, though, that what you're describing extends beyond go. It would be unfortunate for proto definitions to be filled with varying language specific definitions of the "same" metadata.

At the same time, I'm left with this desire for protoc-gen-go to support more flexible interoperability with the rest of the go ecosystem. I had hoped editions maybe provided a path to providing something along those lines. Thank you for taking the time to highlight the issues that remain.

lfolger commented 3 months ago

Also I realized that edition features are not designed for this. While they can certainly be (ab)used for this it is not the intention to use them this way.

From the documentation:

Features are designed to provide a mechanism for ratcheting down bad behavior over time, on edition boundaries. While the timeline for actually removing a feature may be years (or decades) in the future, the desired goal of any feature should be eventual removal....

bmon commented 3 months ago

Unfortunately I can't open your link - it appears to be google internal. The excerpt you posted makes sense though. Would love to be able to read more of that documentation, if it is able to be published.

lfolger commented 3 months ago

Oops I thought it was the public documentation. Sorry about that. I expect this to reach the public docs eventually. I'll keep the posted snippet for now but removed the link.

lfolger commented 2 months ago

FYI: the documentation is now available publicly: https://protobuf.dev/editions/implementation/

aktau commented 2 months ago

Will close this now as it seems to have reached a conclusion. If anything is left to do or say, please re-open.