protoc-gen-go: support go_tag option to specify custom struct tags

jedborovik commented 9 years ago

Is it possible to define a message with custom tags? For example defining a json name that isn't just the lowercase field name or adding a bson tag.

dsymonds commented 9 years ago

No, there's no support for that, nor any intention to support that.

mwmahlberg commented 8 years ago

I'd like to add that this feature would be extremely useful for validation purposes.

jswierad commented 8 years ago

+1

Use case: Using protobuf generated structs along with sqlx package. Would be awesome

mikebell-org commented 8 years ago

Likewise the go library for cloud datastore. Basically, increasing numbers of things make use of struct tags, and without support for them you're left manually duplicating structs? If there are no plans to add this functionality, what is the recommended way to handle it?

iain17 commented 7 years ago

I would love this feature! This seems to somehow mitigate the need

https://github.com/favadi/protoc-go-inject-tag

sefasenturk95 commented 7 years ago

I would love this feature aswell!

willks commented 7 years ago

This would be handy, as I need to validate JSON payloads for my REST API portion of the service.

awalterschulze commented 7 years ago

Since golang/protobuf will probably never support this. Here are some solutions.

https://github.com/mwitkow/go-proto-validators is useful for validation on proto structs.

https://github.com/gogo/protobuf can be used in two ways to have custom tags:

1) You can use the moretags extension https://github.com/gogo/protobuf/blob/master/test/tags/tags.proto

2) Or you can use the typedecl extension if you don't want to generate the golang struct. This allows you declare your own golang struct with all the tags you want and more.

ekiyanov commented 7 years ago

Need it as well to automate insert/update into the sql database

qianlnk commented 7 years ago

I've add a plugin retag for protoc-gen-go.

https://github.com/qianlnk/protobuf/tree/master/protoc-gen-go/retag

protoc --go_out=plugins=grpc+retag:. yourproto.proto

RajeshKumar1990 commented 7 years ago

@qianlnk .thanks.I tried your plugin when i use only message then ther is no problem but when the message have sub message like this

message SendData {
    message MetaData{
        string Name = 1;//`json:"name;omitempty"`
        int64 Length = 2;//`json:"length;omitempty"`
    }
    message Chunck{
    bytes Data = 1;//`json:"data;omitempty"`
    int64 Position = 2;//`json:"position;omitempty"`
    }
}

This throws the error.Any how the retag can support this??? Thanks...

qianlnk commented 7 years ago

@RajeshKumar1990 I fixed it.

atamgp commented 6 years ago

@qianlnk I am having a problem making this work.... does it support proto3?

syntax = "proto3";
package contracts;

import "Common.proto";
//import "google/protobuf/timestamp.proto";

message ZZZ {
  string id = 1;  // `db:"id"`
  string clientId = 2;
  string datetime = 3;
}

Converting with: protoc -I=xyz.eu/contracts/proto --go_out=plugins=grpc+retag:xyz.eu/contracts/ xyz.eu/contracts/proto/*.proto

Result:

type ZZZ struct {
    Id       string  `protobuf:"bytes,1,opt,name=id" json:"id,omitempty"`
    ClientId string  `protobuf:"bytes,2,opt,name=clientId" json:"clientId,omitempty"`
    Datetime string  `protobuf:"bytes,5,opt,name=datetime" json:"datetime,omitempty"`
}

The db custom tag on id is missing.... Like everyone using proto, I need this in order to work with (any) orm:) Thanks for your time

qianlnk commented 6 years ago

@atamgp the first letter in field must be capitalized.

atamgp commented 6 years ago

@qianlnk thank you for the response. I changed the proto to:

string ID = 1; // db:"id,omitempty"

In above and below example the ` before db and at the end of line are omitted by github... but they are there. Unlike your example, only I am using a non existing tag like (no json or xml). Still no db output:

ID string protobuf:"bytes,1,opt,name=ID" json:"ID,omitempty"

If you have another idea please let me know. I am looking into the plugin code right now. I tested that the code is called with a print statement in de init. So the environment seems ok.

qianlnk commented 6 years ago

@atamgp have you install my retag plugin?

git clone https://github.com/qianlnk/protobuf.git $GOPATH/src/github.com/golang/protobuf
go install $GOPATH/src/github.com/golang/protobuf/protoc-gen-go

this is my try:

syntax = "proto3";
package contracts;

//import "Common.proto";
//import "google/protobuf/timestamp.proto";

message ZZZ {
  string ID = 1;        //`db:"id"`
  string ClientID = 2;  //`db:"client_id"`
  string Datetime = 3;  //`db:"datetime"`
}

run:

protoc -I/usr/local/include -I. \
   -I$GOPATH/src \
   --go_out=plugins=grpc+retag:. \
test.proto

and result:

type ZZZ struct {
    ID       string ` protobuf:"bytes,1,opt,name=ID"         json:"ID,omitempty"  db:"id"`
    ClientID string ` protobuf:"bytes,2,opt,name=ClientID"   json:"ClientID,omitempty"  db:"client_id"`
    Datetime string ` protobuf:"bytes,3,opt,name=Datetime"   json:"Datetime,omitempty"  db:"datetime"`
}

atamgp commented 6 years ago

@qianlnk OK, I have it figured out a bit more :)

message ZZZZ {
string ID = 1; //db:"id" string clientID = 2; //db:"id" double amount = 3; }

Things I learned: 1) I have to put the db tag on all fields of a message. Otherwise I get an exception: panic: runtime error: index out of range (panic: runtime error: index out of range -> indexing [1]) For some fields it is not wanted to be mapped by an ORM. Is it possible to make this optional? 2) Like you said earlier, field name has to begin with a capital otherwise its ignored. Ofcourse its possible to misuse this aspect to realize optional fields (1) but is less clean. 3) Command path used. I have this folder structure: $GOPATH/src/X/Y/Z/zzz.proto I was using a protoc command from src folder which did not work:

Working: When calling from within Z: protoc --go_out=plugins=grpc+retag:. zzz.proto Not working: protoc -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/zzz.proto Working: protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/zzz.proto 4) importing from other proto: I have a common.proto and a zzz.proto. zzz is importing an enum and a message from common. working: if a use a working command from point 3) above for 1 file at a time e.g.: protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/zzz.proto protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/common.proto

not working: same commands which are working for 1 file but in order to process both proto files /*.proto instead of a specific proto. E.g.: protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/*.proto errors: .... is already defined in file .... It is really nice to process all proto files in 1 time, can you have a look at this?

Last: It is not clean to have all proto files and generated go files in the same folder. In my case Z is actually named proto and Y is contracts. I want to have my generated go files put in folder Y instead of Z. Still looking for a working combination command for this...

EwanValentine commented 6 years ago

Has anyone found a solution for converting between type: string and type: bson.ObjectId as well?

mavle commented 6 years ago

Please support this it is extremely useful and cuts down on needless code bloat.

puellanivis commented 6 years ago

While this idea is super useful in Go, unfortunately no other language that proto supports can really make use of it. As such, changing/supporting this would be outside of the scope of the official Go protobuf package.

maguro commented 6 years ago

@puellanivis, that is not true. Supporting custom tags would open protobufs to a plethora of possible tooling.

dsnet commented 6 years ago

I'm going to re-open this issue, but it does not mean that we're going to support this. I'd like to see more discussion on the feasibility of this. Some thoughts:

I can see the usefulness of this feature for Go-heavy projects, but protobufs were designed a language-agnostic way to interchange data. Baking more Go-specific details in the proto files seems to be antithetical to the philosophy of protobufs.
This features makes the assumptions that generated Go protobufs are always structs with fields. What does it mean if we want the ability to generate opaque protobuf types that are interacted via methods. Performance testing has shown that techniques like lazy unmarshaling can bring significant speed benefits, but is only feasible as an opaque type.
The proper way to support this seems to be using protobuf options, but I don't see any proposal here suggesting a syntax for how that would work.

qianlnk commented 6 years ago

why not add a plugin like

https://github.com/qianlnk/protobuf/tree/master/protoc-gen-go/retag

This plugin is my original idea, don't take too much effort to attain it.But I believe that many people need it, so I hope more people can join in.

meling commented 6 years ago

@qianlnk Looked at your retag plugin as one of several alternatives. I don't really like the idea of cloning it into my golang/protobuf folder, especially since This branch is 8 commits ahead, 28 commits behind golang:master. Would you consider making it a standalone plugin instead?

meling commented 6 years ago

@dsnet See tags.proto which was referred to by @awalterschulze as an example of using the protobuf options approach.

It could perhaps look something like this:

message Enrollment {
   uint64 CourseID = 1 [(go_tags) = "gorm:\"unique_index:uid_user_course\""];
   uint64 UserID = 2 [(go_tags) = "gorm:\"unique_index:uid_user_course\""];
   int32 foo = 3 [(go_tags) = "sql:\"type:text\""];
   string bar = 4;
}

This would be a very useful addition to avoid a lot of boilerplate, and IMO it would fit nicely in with other language-specific options, such as e.g. java_package to name just one. Just like java_package is ignored when generating Go files, go_tags would be ignored when generating non-Go code.

The only issue that I don't like with the above is the need for the \". It would be nice if that can be avoided, but it is a compromise I don't mind if we can get this feature into the main project.

meling commented 6 years ago

@dsnet Regarding your point about opaque types. I think there are two main use cases that have come up in this issue.

Boilerplate avoidance for database/storage backends.
Validation of input fields.

I think the first is best dealt with using something like my proposed go_tags, because the tools that process the tags are their own beasts and it seems difficult to interact via lazy functions.

Validation of input fields seems different though. I can imagine that validation could be done as part of lazy unmarshaling, but that would require another option such as in go-proto-validators. However, such lazy unmarshaling with validation is not Go-specific, and so I don't think it makes sense to put such validation data into struct tags in the generated code. Indeed, go-proto-validators does not put validation data in struct tags.

chideat commented 6 years ago

IMO, it's better to provide a global option, for example go_tags with value like "bson" or "bson,xml".

It could perhaps look something like this:

syntax="proto3";

package user;
option go_package = "user";
option go_tags = "bson";

message User {
    string id   = 1;
    string name = 2;
}

Which generates:

type User struct {
    Id string `protobuf:"varint,1,opt,name=id" json:"id,omitempty" bson:"id,omitempty"`
    Name string `protobuf:"varint,1,opt,name=name" json:"name,omitempty" bson:"name,omitempty"`
}

protoc-gen-go generates json by default, but sometime we need others tags for marshal/unmarshal. As a result, I have to write another struct the same as generated except tags.

meling commented 6 years ago

A global option may work for your use case with bson, but many other use cases require per field options.

chideat commented 6 years ago

@meling I don't thinking it's a good idea to insert more info to every field option. protoc-gen-go bind struct with json, but we use struct in many cases, such as bson and xml and some custom cases. The global option mentioned above is to solve the problem, which is not only in my case.

To database/storage, for example gorm, use tags to define index/unique_index is a crazy action. We use protobuf with many languages, option for every field is useless for python/java. These options make the .proto unreadable for python/java developers. What's more, option for every field makes the .proto file hard to read and modify, for there are many service logics. Simple is better.

meling commented 6 years ago

Why is it crazy to define index fields using standard protobuf options syntax? I didn’t design the option syntax for protobuf, but I learn to adapt. Just because something is unreadable to some people doesn’t mean it is crazy. What is crazy is to do “manual” conversion between a protobuf struct and a database struct when they are otherwise identical, because that is just error prone... and slow.

Of course your approach is simple, but it doesn’t solve the problems of at least half of the people asking for this feature...

awalterschulze commented 6 years ago

I could not have said it better myself 😀 +1 Hein Meling

On Wed, 27 Dec 2017, 08:45 Hein Meling, notifications@github.com wrote:

Why is it crazy to define index fields using standard protobuf options syntax? I didn’t design the option syntax for protobuf, but I learn to adapt. Just because something is unreadable to some people doesn’t mean it is crazy. What is crazy is to do “manual” conversion between a protobuf struct and a database struct when they are otherwise identical, because that is just error prone... and slow.

Of course your approach is simple, but it doesn’t solve the problems of at least half of the people asking for this feature...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/protobuf/issues/52#issuecomment-354069368, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvsLfngLDTEv0R834oLqVAF01xUdQP4ks5tEfWZgaJpZM4Fbmpd .

dsnet commented 6 years ago

A file-level option for tags is off the table. It requires protoc-gen-go to have a semantic understand of all the tag formats that users may want, thus coupling a variety of struct tag formats into the proto repo. Individual field options would be the way to go since they are more extensible and does not bake package-specific logic into the generator.

chideat commented 6 years ago

@dsnet I could not agree more.

huayun321 commented 6 years ago

👎

willks commented 6 years ago

I abandoned my efforts on this since I saw the posts. So now I keep separate models (with JSON mapping to do the boilerplate) for protobuf generated structs ("Remote" structs, as I suffixed them now). A bit more code and another layer of indirection, but it works and has been in production for 4 months now.

meling commented 6 years ago

@willks From your first comment it seems like you need to do validation. I wonder if you could provide some more details about your use case, e.g. an example. In my earlier reasoning I found that adding validation data to struct tags may not be the best approach, since ideally you would want to have such validation data (e.g. upper/lower bound on an integer value) encoded such that the protoc compiler can generate validation code for all supported languages, not just Go. If validation is the main purpose, perhaps you could make the case to add such features to the top-level protobuf compilers (I have no idea what the politics would be or if something like this has already been proposed...)

However, avoiding boilerplate code to translate proto messages that map identically to a database/storage model, that may need some additional go tags for the storage model, I do think making use of a go_tags option makes sense. My interpretation of @dsnet is that the project is open to accepting something along these lines... I have been thinking about writing up a patch myself, but haven't found the time, and the project where I need this is currently on hold.

I would encourage more people to submit examples of how they imagine that they would use such custom options to generate custom Go tags.

To start this off, let me point to our Autograder project where we want to use this: ag_service.proto. We tried using gogoproto.moretags, but there are some compatibility issues with the grpc-web frontend codebase in typescript that remains unresolved.

puellanivis commented 6 years ago

RE validation: I would like to note one of the reasons why the required/optional distinction was eliminated was because of a considered position within (and without?) Google that validation should not be a part of the protobuf process itself, but rather consumers of a protobuf should be validating protobufs themselves.

Boilerplate validation is convenient for quickly throwing validation into something, but in the end, the inflexibility results in difficult to maintain, if not unmaintainable code. (The second one wishes to do anything other than just reject the protobuf outright, a complete rewrite must be performed, and oftentimes, removing the baked-in validation from the protobuf is an impossible, due to infrastructure being built upon the assumption.)

lmapii commented 6 years ago

for me the major benefit of this feature would be model binding: i have a REST service implemented in go (with gin-gonic), the body is encoded in JSON and can be mapped to a go-struct using the json-annotations.

now this REST server is actually just a gateway to embedded devices which use ProtocolBuffers for the de-/encoding of messages. without any possibility to manipulate the annotations generated by protoc i have to bind the full message 1:1 to the (JSON) REST service.

implementing, e.g., a message with some field holding internal data (only used by the gateway and the device but not exposed by REST) is not possible without either manipulating the generated code OR duplicating each and every message this gateway sends through > which defeats the whole point of using ProtocolBuffers.

more likely use-case: the API is defined externally. someone suddenly decides that CamelCase is a good option instead of underscores > now instead of simply modifying the JSON annotations i'd have to either re-generate and thus re-program all devices or introduce boilerplate (mapping)

willks commented 6 years ago

@meling You are right, this is not something that protobufs/gRPC needs to be concerned about. Protobufs/gRPC is the transport mechanism, it's how we convert it to the relevant domain model that we need to be concerned about. Hope that makes sense :) I swapped to Java for my server side since my last post, consolidating of the logic based on generics now. My validation is made on the "mapped" objects in Java to return the relevant errors.

My clients are both iOS/Android. It has been addressed so I take back my previous post :) Thanks for replying.

dsnet commented 6 years ago

Here's a formal proposal.

Background

The protoc generators for Go protobufs have historically generated proto messages as structs with fields since 2009. The major benefit of struct with exported fields is that it produces more idiomatic Go code and is generally nicer to use for users. However, at the same time, users were never given control of struct field tags, which are arguably an important aspect of how Go structs can interact with the rest of the ecosystem (e.g., encoding/xml, gopkg.in/mgo.v2/bson, etc).

Proposal

I propose we add support for a go_tag proto option that enables the user to specify the struct tag value for a field.

Example:

syntax = "proto3";

package example.gotag;

message Foo {
    string Field1 = 1 [go_tag = "xml:\",chardata\""]
    oneof Field2 {
        option go_tag = "xml:\",chardata\""
        string Field2a = 2;
        string Field2b = 3;
    }
}

The specifics:

1) This option only affects FieldOptions and OneOfOptions and is ignored when the option is specified on any other option dimension.

2) Only generators that emit messages as structs with exported fields need to respect go_tag. Those that don't can ignore it.

3) The syntax of go_tag is any arbitrary string. It is recommended (but not required) that the string follows the tag convention used by nearly every Go package. The value of this tag is concatenated to any tags already generated by default. For protoc-gen-go, we already generate protobuf and jsontags, which we should phase out over time (see points 4 and 5).

4) For protoc-gen-go, phase out generation of protobuf struct tags. The information held in the protobuf and protobuf_xxx tags are implementation details that should stop leaking out to the public API. The work to make Message behaviorally complete (#364) will provide the proper API to supply the proto type information that was originally encoded in the protobuf tags.

5) For protoc-gen-go, phase out generation of json struct tags. The json tags were an half-hearted attempt at satisfying the JSON<->Protobuf mapping. However, the encoding/json package is fundamentally insufficient to properly satisfy the mapping for all features of protobufs (e.g., well-known types, required fields, etc). Rather than provide the illusion that we properly satisfy the specification (when we don't), we should drop support for the mapping entirely. If users desires for the encoding/json package to be equivalent to the JSON<->Protobuf mapping (then see #256). If users desire to directly control how encoding/json interacts with generated messages, then they can use the go_tag option to control the tags.

Item 5 is a breaking change. See #526 for discussion on how we can evolve the generator.

Use cases

This feature enables full control of struct tags by the user in a simple way.

This supersedes the jsontag and moretags options in gogo/protobuf.

Design specifics

What happens if the go_tag contains the json tag? For the near future, users should avoid doing this since it will conflict the json tag already generated by default. It is up to each generator implementation whether to explicitly ensure that the json tag is not set to avoid a conflict.

What happens if go_tag is specified on fields belonging to a oneof? A generator may ignore it if the fields of a one-of are not represented as a struct (as is the case for protoc-gen-go). However, if a generator does represent a oneof as a flat struct with nullable fields, they may choose to place the tag on the corresponding field.

What happens if go_tag is specified on an extension? The generator should error in this case. It is impossible for extensions to be added to a struct at proto compile time since extended fields can be added by any proto file.

What about generating "opaque" messages? There have been some proposals to possibly change the layout to support other generated APIs that are more conducive to lazy unmarshaling, where only a small set of fields are accessed. Such APIs would avoid being struct structs with fields, but instead enable access to fields via setter/getter methods. In such a case, struct tags have no semantic meaning.

At this point, it is probably reasonable to assume that the specific API generated by protoc-gen-go will always be struct with fields. Support for different APIs should be performed by an entirely different generator implementation. Such generators that do not emit messages as a structs with fields should just ignore the go_tag option.

\cc @neild @awalterschulze

neild commented 6 years ago

Total bikeshed: go_struct_tag.

awalterschulze commented 6 years ago

Loving this proposal.
This is one of the harder ones because of the json tag already being generated and the backwards compatibility problems that it might cause. And then also adding the confusion of jsonpb vs encoding/json. I am optimistic that this proposal helps in both regards.

A possible solution for number 5 would be to create protoc-gen-gov2 that does not generate any json tags and break backwards compatibility. Then we can add a preprocessing step to protoc-gen-go that ads all the json tags as new go_tag options. That way the generator library code only needs to maintain the newest version.

If we semi controversially live at head then we can also provide a tool to upgrade .proto files to include the jsontags inside go_tag options, for users who want to keep this feature.

But thats just some ideas for number 5. Overall I think this proposal looks awesome.

tikiatua commented 6 years ago

@dsnet Thank you very much for taking the time to phrase this awesome proposal. I completely agree with you that it would probably be better to drop the auto generated json tags.

However, to me, the readability of tags as protobuf options does not seem to be very good. This may lead to errors. Additionally it is actually quite cumbersome to write tags this way – which I would say does not align very well with the go spirit.

Therefore, another approach could be to keep the go struct tag information completely separate from the proto definition – i.e. in a adjacent .go file. This information could then be used to post process the .pb.go files and apply the go tag definitions to the generated structs. This would avoid putting non transport related information into the proto files while still allowing us to define custom go struct tags. We could even go a step further and define custom functions that are merged onto the generated structs.

mytype.proto, mytype-tags.go -> mytype.pb.go (with tags applied from mytype-tags.go)

For now, we wrote a small utility that allows us to use comments to specify tags. We then post process the .pb.go files and apply the comments as go struct tags using the go abstract syntax tree.

A minor inconvenience is that the protoc grpc plugin will not keep same line comments, so the tag comments must be placed above the field definition in the proto file.

The utility and a complete example can be found here: https://github.com/dkfbasel/protobuf

millergarym commented 6 years ago

There was a very nice json library in Java that is worth looking at for some ideas https://github.com/pascallouisperez/jsonmarshaller/blob/wiki/Options.md

Java's annotations are imo (one of the only things) better than Go's tags.

The jsonmarshaller library allowed for different views, eg different names based on a view name.

Potentially issue 5 (PB <> JSON) should be solved by a library that ignores that json tags on the generated protobufs.

taoso commented 6 years ago

Here is another simple solution

taoso commented 6 years ago

@meling @dsnet I don't think it is a good idea to use a option like go_tag. As this option is golang only. Even the struct tag is golang only. Trying make a cross language solution is a waste of time.

taoso commented 6 years ago

@dsnet the go_tag is useless for other language. And actually, this repo can only generate golang code.

dsnet commented 6 years ago

Yes, this repo contains the logic to generate .pb.go files. However, a .proto file is ingested by generators that produce source code for other languages. Magical comments is never going to fly compared to a proto option like go_tag or something else.

meling commented 6 years ago

@lvht Thanks for your input on this. Let me clarify some other concerns I have with your proposal, beyond our lack of enthusiasm for magical comments in proto files.

As I pointed out here, I think its a bad idea to use Go's struct tags to express data for validation functions. This is because you would either need to use reflection in the Validate() function to extract the data from these struct tags (slow), or use another generator to generate the Validate() function (pointless).

Moreover, you would also need to define a "validator language" that can then be translated into code for the Validate() function (as in go-proto-validators). Note that in go-proto-validators, no validation data is stored in Go's struct tags. This seems like something that should be its own effort and discussed elsewhere, and be part of the protobuf language, or a separate language for validation. As a side note, I don't particularly care for notation such as lt and gt etc.

As a simpler solution, I could imagine a command line flag that adds a call to a user-defined Validate() function that gets called after unmarshaling to check message fields. Or you could probably avoid a command line flag by defining a boolean for each message type that is checked before calling Validate() as outlined below.

In .pb.go:

var innerMessageHasValidate bool

func (m *InnerMessage) Unmarshal(dAtA []byte) error {
  // unmarshal message into m's fields
  if innerMessageHasValidate {
    return m.Validate()
  }
  return nil
}

In a user-defined validate.go file:

func init() {
  innerMessageHasValidate = true
}

func (m *InnerMessage) Validate() error {
  // user-defined validation
}

Such Validate() functions must then be defined in a separate .go file and placed in the same folder as the .pb.go file (or similar for other languages). This would allow full language flexibility in how to define the validation, and wouldn't need its own "validator language". Of course, the drawback is that you would need to write the same Validate() logic for each language, which would be error prone. So there is definitely a case to be made for a "validator language", but I don't think this is the place to address that. Either way, I think the above structure with a Validate() function can be made compatible with a future validation generator, based on some "validator language".

taoso commented 6 years ago

@meling you solution is awesom. Thank you. 👍

golang / protobuf