Closed ofpiyush closed 3 years ago
The derived Go names that protoc-gen-go
uses (or more specifically compiler/protogen
produces) has long been known to be problematic and possibly produce conflicts. Unfortunately, there is not much we can do about it since changing the behavior will likely break the world. For that reason, the new google.golang.org/protobuf
module did not fix this behavior since it's deeply ingrained into the wider use of protobufs in Go and we wanted the new module to be reasonably compatible with the older github.com/golang/protobuf
to the best we can.
I should note that naming styles in most programming languages is typically an aesthetic concern. However, for protocol buffers, it's actually a semantic concern since the naming style used makes a difference in whether generated code is even valid. Therefore adherence to the style is of much greater concern beyond just personal preferences. If the protobuf style guide is properly followed, conflicts should not occur. Otherwise, all bets are off, and we can't guarantee conflict-free name generation.
If you want guaranteed conflict-free name derivation, then the following naming styles must be used for the corresponding descriptor types:
Descriptor Type | Naming Style |
---|---|
Enum | CamelCase |
EnumValue | UPPER_SNAKE_CASE |
Message | CamelCase |
Field | lower_snake_case |
Oneof | lower_snake_case |
Extension | lower_snake_case |
Service | CamelCase |
Method | CamelCase |
In particular:
[A-Z][a-zA-Z0-9]*
, where any letters that appear after a number must be uppercase (e.g., Ext4Mount
is okay, but Ext4mount
is not).[A-Z][_A-Z0-9]*
(e.g., GEO_LOCATION
is okay).[a-z][_a-z0-9]*
, where letters must not follow any numbers (e.g., int64_value
is okay, but int64value
is not).Explicit failure is better than a silent one in case there are no plans to support non-style compliant proto files.
Centralised proto-> multiple language automated builds pass right now, only to get broken on import.
Can we add an explicit failure with an error message for cases where conflicts will occur?
If we want to support the behaviour, we can follow the convention proto2 codegen does and add an _
for the 2nd occurence of a conflicting name.
Explicit failure is better than a silent one in case there are no plans to support non-style compliant proto files.
There's no way we can enforce this today without breaking the world. The situation is unfortunate, but it is what it is.
If we want to support the behaviour, we can follow the convention proto2 codegen does and add an _ for the 2nd occurence of a conflicting name.
We already do this for some small set of conflicts. We consider it a grave mistake to have ever done this since it means that generated names are dependent on the order that fields are specified in a .proto
file.
There's no way we can enforce this today without breaking the world.
The world already breaks when those failures happen. It just breaks when the program compiles instead of code generation.
Making it break at build time will not break any successfully running program, will help catch problems early.
We consider it a grave mistake to have ever done this since it means that generated names are dependent on the order that fields are specified in a .proto file.
Oh yes, you're correct! For fields I assumed you'd have ordered it based on the field numbering system. No way to translate that into methods.
~Some prefix like X_
which can't be generated might be a potential solution. It won't work if you have 3 variants of the same name. But it would cover enough cases to help eventually fix a proto file.~
Nope, can't reliably think of not breaking the world with this strategy.
The world already breaks when those failures happen. It just breaks when the program compiles instead of code generation.
The code breaks if you add another non-compliant name that results in a conflict. That's very different from breaking the code the moment if any non-style compliant name is used.
I should also note that style enforcement is not the responsibility of protoc-gen-go
. Personally, I support such a notion, but this is something that should be discussed at the main protocol buffer project and be implemented in protoc
.
That's very different from breaking the code the moment if any non-style compliant name is used.
I meant the examples I posted in original issue, sorry for the confusion. We can fail explicit when we know generated code will not compile.
something that should be discussed at the main protocol buffer project
Given that one officially supported major language implementation stops working in some cases, this might be a good candidate issue for discussion in v4.
add another non-compliant name
Someone trying to migrate from non-compliant code to compliant one will also face this. If there is a way to solve for that case, I can help dedicate time on that.
We can fail explicit when we know generated code will not compile.
What's the point of building conflict detection code into protoc-gen-go
if the code already doesn't compile? We'd just be re-implementing the exact same logic as the Go compiler itself. Therefore, It seems entirely appropriate to let the Go compiler be the explicit failure for this class of failures.
this might be a good candidate issue for discussion in v4.
What is v4?
What's the point of building conflict detection code into protoc-gen-go
Centralised proto-> multiple language automated builds pass right now, only to get broken on import.
We'd just be re-implementing the exact same logic as the Go compiler itself.
I believe it might not be that difficult to achieve this by leveraging public API of the go compiler. https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/go/ast/resolve.go#L74
We already do something similar for formatting go code, checking for package errors will not be a lot of additional effort for better UX.
What is v4?
proto4, whenever that comes. Is there any official place to request features/fixes for the next major version or to file things for consideration when that conversation begins?
We already do something similar for formatting go code, checking for package errors will not be a lot of additional effort for better UX.
That's 100LOC to maintain for slightly better UX; I'm not convinced this is worth it. If there's anything to do here, it should be in protoc
to at least print a warning that non-style compliant names are being used. The problem we're facing here is certainly not unique to Go.
proto4, whenever that comes. Is there any official place to request features/fixes for the next major version or to file things for consideration when that conversation begins?
In my opinion, the introduction of proto3 itself, given the pre-existing proto2 has been technically expensive to maintain and rollout. I'm doubtful there will be a proto4. Regardless, any discussion about proto4 should be at github.com/protocolbuffers/protobuf
The problem we're facing here is certainly not unique to Go.
This issue is more about how to handle conflicting cases that break go generated code. It is not about addressing all non-style-compliant code.
Case sensitivity is pretty unique to go. Almost every other language can generate service and method names as is and call it a day. They might not like the style of generated code, but it will work nevertheless.
github.com/protocolbuffers/protobuf
Thank you 😊
That's 100LOC to maintain for slightly better UX
I figured a better 2 LOC check.
conf := &types.Config{Importer: importer.Default()}
_, err = conf.Check("", fset, []*ast.File{fileAST}, nil)
Is it worth it now?
That snippet tells us that the code doesn't compile. Can you explain how that's better UX than the Go toolchain not compiling the code? To my knowledge, the go/types
package doesn't return a distinguishable error for conflicting identifiers.
Codebases with multiple languages usually have a centralised step to generate code in multiple languages from proto files.
Right now, the code gen succeeds and breaks during import. Failing at build time will prevent such code from being released at all.
As an example, think of places where services are in other languages but also have generated go clients etc. They would release a new service before ever realising that the generated code will break in go.
The most common case I can think of is when people are switching from non-compliant code to compliant one.
Can you explain how that's better UX
Failing at build time and printing the error with a documentation link which explains this gotcha is much better than leaving people in the dark to discover on their own.
The crux is that a build time error would avoid a lot of work later.
I agree with @ofpiyush here. The earlier a break occurs in a complex build system, the better it is for everyone. We've been fighting with:
enum MyType {
MY_TYPE_UNSPECIFIED = 0;
name = 1;
address = 2;
}
causing a name conflict between the generated const definition of:
const (
MyType_MY_TYPE_UNSPECIFIED MyType = 0
MyType_name MyType = 1
MyType_address MyType = 2
)
and the later generated definition of:
// Enum value maps for MyType.
var (
MyType_name = map[int32]string{
0: "MY_TYPE_UNSPECIFIED",
1: "name",
2: "address",
}
)
All of our .proto files are in a central repo that produces packages for several different languages, including go. The repo has pretty tight linting, but in this particular case the author of the above had a good reason for turning off UPPER_SNAKE_CASE enforcement on the MyType
enum values, so the linter ignored it. The protoc build succeeded, even for go, but the generated .pb.go file contained a name conflict, which in turn broke several other project's builds. Having protoc-gen-go
reject this due to the name conflict immediately would have been quite helpful.
If you're worried about "breaking the world", why not make the generation time conflict check @ofpiyush proposed a flag, perhaps --go-opt=conflict-check=true
, that defaults to off.
I think a central issue of dealing with non-style-compliant protobuf names and these sorts of name conflicts is that for the most part, either we would need to track all these package-global names ourselves, or import the whole Go toolchain to reparse the code we output to check for these issues.
More ideally, anyone generating Golang protobuf code could just add a go test
in the same directory, which—even without any tests—will at least compile the code and validate that it compiles.
import the whole Go toolchain to reparse the code
We already do this to format the code. https://github.com/golang/protobuf/blob/v1.4.3/protoc-gen-go/generator/generator.go#L21-L25
More ideally, anyone generating Golang protobuf code could just add a
go test
That is not ideal, it is a very non-obvious requirement to put on one's user.
It would still somewhat be palatable it if we had precedence with saying "you need to run go fmt
on generated code yourself"
The following is not meant to be a dig, just a reflection on the hilarious situation we're all in. More to lighten the atmosphere on this thread than anything else:
Today protoc-gen-go
says,
I'll give you code that you can't run, but at least it'll be formatted pretty 😅
Closing as "working as intended". It's appropriate for us to the rely on the Go toolchain to reject generated code that has conflicts. In working in the protogen
package recently, there a number of cases where I could perform checks that the Go toolchain itself is going to check for us. It won't be very maintainable to replicate every check that the Go toolchain itself performs in protogen
.
What version of protobuf and what language are you using? // protoc-gen-go v1.25.0-devel // protoc v3.12.4
What did you do?
Tried to generate this file.
What did you expect to see? Some solution on how protoc-gen-go resolves/avoids conflict in name conversion.
What did you see instead?
Anything else we should know about your project / environment?
Twirp is trying to resolve naming convention issues in go. Specifically, how to handle service and method names that begin with a lower case letter. What should they do if it happens to conflict with another method or service. https://github.com/twitchtv/twirp/pull/269
I figured this project would've probably solved this issue with message names. Sadly. it does not.