golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.72k stars 1.58k forks source link

compiler/protogen: Go name conflicts can occur for non-style compliant protobuf names #1206

Closed ofpiyush closed 3 years ago

ofpiyush commented 3 years ago

What version of protobuf and what language are you using? // protoc-gen-go v1.25.0-devel // protoc v3.12.4

What did you do?

Tried to generate this file.

syntax = "proto3";
package hello;
option go_package=".;hello";

message myMessage {
  string myField = 1;
}

message my_Message {
  string my_Field = 1;
}

message my_message {
  string my_field = 1;
}

message MyMessage {
  string MyField = 1;
}

message My_Message {
  string My_Field = 1;
}

message My_message {
  string My_field = 1;
}

What did you expect to see? Some solution on how protoc-gen-go resolves/avoids conflict in name conversion.

What did you see instead?

proto message name go struct name
myMessage MyMessage
MyMessage MyMessage
my_message MyMessage
My_message MyMessage
my_Message My_Message
My_Message My_Message

Anything else we should know about your project / environment?

Twirp is trying to resolve naming convention issues in go. Specifically, how to handle service and method names that begin with a lower case letter. What should they do if it happens to conflict with another method or service. https://github.com/twitchtv/twirp/pull/269

I figured this project would've probably solved this issue with message names. Sadly. it does not.

dsnet commented 3 years ago

The derived Go names that protoc-gen-go uses (or more specifically compiler/protogen produces) has long been known to be problematic and possibly produce conflicts. Unfortunately, there is not much we can do about it since changing the behavior will likely break the world. For that reason, the new google.golang.org/protobuf module did not fix this behavior since it's deeply ingrained into the wider use of protobufs in Go and we wanted the new module to be reasonably compatible with the older github.com/golang/protobuf to the best we can.

I should note that naming styles in most programming languages is typically an aesthetic concern. However, for protocol buffers, it's actually a semantic concern since the naming style used makes a difference in whether generated code is even valid. Therefore adherence to the style is of much greater concern beyond just personal preferences. If the protobuf style guide is properly followed, conflicts should not occur. Otherwise, all bets are off, and we can't guarantee conflict-free name generation.

If you want guaranteed conflict-free name derivation, then the following naming styles must be used for the corresponding descriptor types:

Descriptor Type Naming Style
Enum CamelCase
EnumValue UPPER_SNAKE_CASE
Message CamelCase
Field lower_snake_case
Oneof lower_snake_case
Extension lower_snake_case
Service CamelCase
Method CamelCase

In particular:

ofpiyush commented 3 years ago

Explicit failure is better than a silent one in case there are no plans to support non-style compliant proto files.

Centralised proto-> multiple language automated builds pass right now, only to get broken on import.

Can we add an explicit failure with an error message for cases where conflicts will occur?

If we want to support the behaviour, we can follow the convention proto2 codegen does and add an _ for the 2nd occurence of a conflicting name.

dsnet commented 3 years ago

Explicit failure is better than a silent one in case there are no plans to support non-style compliant proto files.

There's no way we can enforce this today without breaking the world. The situation is unfortunate, but it is what it is.

If we want to support the behaviour, we can follow the convention proto2 codegen does and add an _ for the 2nd occurence of a conflicting name.

We already do this for some small set of conflicts. We consider it a grave mistake to have ever done this since it means that generated names are dependent on the order that fields are specified in a .proto file.

ofpiyush commented 3 years ago

There's no way we can enforce this today without breaking the world.

The world already breaks when those failures happen. It just breaks when the program compiles instead of code generation.

Making it break at build time will not break any successfully running program, will help catch problems early.

We consider it a grave mistake to have ever done this since it means that generated names are dependent on the order that fields are specified in a .proto file.

Oh yes, you're correct! For fields I assumed you'd have ordered it based on the field numbering system. No way to translate that into methods.

~Some prefix like X_ which can't be generated might be a potential solution. It won't work if you have 3 variants of the same name. But it would cover enough cases to help eventually fix a proto file.~ Nope, can't reliably think of not breaking the world with this strategy.

dsnet commented 3 years ago

The world already breaks when those failures happen. It just breaks when the program compiles instead of code generation.

The code breaks if you add another non-compliant name that results in a conflict. That's very different from breaking the code the moment if any non-style compliant name is used.

I should also note that style enforcement is not the responsibility of protoc-gen-go. Personally, I support such a notion, but this is something that should be discussed at the main protocol buffer project and be implemented in protoc.

ofpiyush commented 3 years ago

That's very different from breaking the code the moment if any non-style compliant name is used.

I meant the examples I posted in original issue, sorry for the confusion. We can fail explicit when we know generated code will not compile.

something that should be discussed at the main protocol buffer project

Given that one officially supported major language implementation stops working in some cases, this might be a good candidate issue for discussion in v4.

add another non-compliant name

Someone trying to migrate from non-compliant code to compliant one will also face this. If there is a way to solve for that case, I can help dedicate time on that.

dsnet commented 3 years ago

We can fail explicit when we know generated code will not compile.

What's the point of building conflict detection code into protoc-gen-go if the code already doesn't compile? We'd just be re-implementing the exact same logic as the Go compiler itself. Therefore, It seems entirely appropriate to let the Go compiler be the explicit failure for this class of failures.

this might be a good candidate issue for discussion in v4.

What is v4?

ofpiyush commented 3 years ago

What's the point of building conflict detection code into protoc-gen-go

Centralised proto-> multiple language automated builds pass right now, only to get broken on import.

We'd just be re-implementing the exact same logic as the Go compiler itself.

I believe it might not be that difficult to achieve this by leveraging public API of the go compiler. https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/go/ast/resolve.go#L74

We already do something similar for formatting go code, checking for package errors will not be a lot of additional effort for better UX.

What is v4?

proto4, whenever that comes. Is there any official place to request features/fixes for the next major version or to file things for consideration when that conversation begins?

dsnet commented 3 years ago

We already do something similar for formatting go code, checking for package errors will not be a lot of additional effort for better UX.

That's 100LOC to maintain for slightly better UX; I'm not convinced this is worth it. If there's anything to do here, it should be in protoc to at least print a warning that non-style compliant names are being used. The problem we're facing here is certainly not unique to Go.

proto4, whenever that comes. Is there any official place to request features/fixes for the next major version or to file things for consideration when that conversation begins?

In my opinion, the introduction of proto3 itself, given the pre-existing proto2 has been technically expensive to maintain and rollout. I'm doubtful there will be a proto4. Regardless, any discussion about proto4 should be at github.com/protocolbuffers/protobuf

ofpiyush commented 3 years ago

The problem we're facing here is certainly not unique to Go.

This issue is more about how to handle conflicting cases that break go generated code. It is not about addressing all non-style-compliant code.

Case sensitivity is pretty unique to go. Almost every other language can generate service and method names as is and call it a day. They might not like the style of generated code, but it will work nevertheless.

github.com/protocolbuffers/protobuf

Thank you 😊

ofpiyush commented 3 years ago

That's 100LOC to maintain for slightly better UX

I figured a better 2 LOC check.

conf := &types.Config{Importer: importer.Default()}
_, err = conf.Check("", fset, []*ast.File{fileAST}, nil)

Is it worth it now?

dsnet commented 3 years ago

That snippet tells us that the code doesn't compile. Can you explain how that's better UX than the Go toolchain not compiling the code? To my knowledge, the go/types package doesn't return a distinguishable error for conflicting identifiers.

ofpiyush commented 3 years ago

Codebases with multiple languages usually have a centralised step to generate code in multiple languages from proto files.

Right now, the code gen succeeds and breaks during import. Failing at build time will prevent such code from being released at all.

As an example, think of places where services are in other languages but also have generated go clients etc. They would release a new service before ever realising that the generated code will break in go.

The most common case I can think of is when people are switching from non-compliant code to compliant one.

Can you explain how that's better UX

Failing at build time and printing the error with a documentation link which explains this gotcha is much better than leaving people in the dark to discover on their own.

The crux is that a build time error would avoid a lot of work later.

shaunco commented 3 years ago

I agree with @ofpiyush here. The earlier a break occurs in a complex build system, the better it is for everyone. We've been fighting with:

enum MyType {
  MY_TYPE_UNSPECIFIED = 0;
  name = 1;
  address = 2;
}

causing a name conflict between the generated const definition of:

const (
  MyType_MY_TYPE_UNSPECIFIED      MyType = 0
  MyType_name                     MyType = 1
  MyType_address                  MyType = 2
)

and the later generated definition of:

// Enum value maps for MyType.
var (
    MyType_name = map[int32]string{
        0:  "MY_TYPE_UNSPECIFIED",
        1:  "name",
        2:  "address",
    }
)

All of our .proto files are in a central repo that produces packages for several different languages, including go. The repo has pretty tight linting, but in this particular case the author of the above had a good reason for turning off UPPER_SNAKE_CASE enforcement on the MyType enum values, so the linter ignored it. The protoc build succeeded, even for go, but the generated .pb.go file contained a name conflict, which in turn broke several other project's builds. Having protoc-gen-go reject this due to the name conflict immediately would have been quite helpful.

If you're worried about "breaking the world", why not make the generation time conflict check @ofpiyush proposed a flag, perhaps --go-opt=conflict-check=true, that defaults to off.

puellanivis commented 3 years ago

I think a central issue of dealing with non-style-compliant protobuf names and these sorts of name conflicts is that for the most part, either we would need to track all these package-global names ourselves, or import the whole Go toolchain to reparse the code we output to check for these issues.

More ideally, anyone generating Golang protobuf code could just add a go test in the same directory, which—even without any tests—will at least compile the code and validate that it compiles.

ofpiyush commented 3 years ago

import the whole Go toolchain to reparse the code

We already do this to format the code. https://github.com/golang/protobuf/blob/v1.4.3/protoc-gen-go/generator/generator.go#L21-L25

It is a 2 LOC check.

More ideally, anyone generating Golang protobuf code could just add a go test

That is not ideal, it is a very non-obvious requirement to put on one's user.

It would still somewhat be palatable it if we had precedence with saying "you need to run go fmt on generated code yourself"


The following is not meant to be a dig, just a reflection on the hilarious situation we're all in. More to lighten the atmosphere on this thread than anything else:

Today protoc-gen-go says,

I'll give you code that you can't run, but at least it'll be formatted pretty 😅

dsnet commented 3 years ago

Closing as "working as intended". It's appropriate for us to the rely on the Go toolchain to reject generated code that has conflicts. In working in the protogen package recently, there a number of cases where I could perform checks that the Go toolchain itself is going to check for us. It won't be very maintainable to replicate every check that the Go toolchain itself performs in protogen.